... representation, our problem is to find a parameter vector θ ∈ RK for which Q(s, a) = θT φ(s, a) most closely approximates E[R(s, a)] To learn these weights θ we use SARSA (Sutton and Barto, 1998), an online ... Map Task corpus we only observe expert route following behavior, but are not told how portions of the text correspond to parts of the path, leading to a difficult learning problem The semantics ... the reference path Our task is to build an automated instruction follower Whereas the original participants could speak freely, our system does not have the ability to query the instruction giver...