... itis her task to describe the path to the instructionfollower, who cannot see the reference path. Oursystem learns to interpret these navigational direc-tions, without access to explicit linguistic ... θTφ(s, a) most closely approximatesE[R(s, a)]. To learn these weights θ we useSARSA (Sutton and Barto, 1998), an online learn-ing algorithm similar to Q-learning (Watkins andDayan, 1992).Algorithm ... spatial word. The toprow shows the weights of allocentric (landmark-centered) features. For example, the top left figure showsthat when the word above occurs, our policy prefers to go to the north...