... No On PolicyIAC No On PolicyNAC No On PolicyDynaSARSA(λ) Yes On ValueDynaQ Yes Off ValueDynaQ(λ) Yes Off ValueDynaAC-QV Yes On PolicyTable 1: Online RL algorithms used in ourevaluation.While ... for convergence, even if we usefunction approximation (Bhatnagar et al., 2007).Algorithm Model Policy IterationSARSA(λ) No On ValueLS-SARSA(λ) No On PolicyQ Learning No Off ValueQ(λ) No Off ... Annual Congress on Evolutionary Com-putation, pp 1521–1528.Atkeson, C.G., Santamaria, J.C., 1997, A comparison of direct and model-based reinforcement learning,IEEE Robotics and Automation, pp...