... Cambridge,MA.Sutton, R.S.,Mcallester, D., Singh, S., Mansour, Y.2000, Policy gradient methods for reinforcementlearning with function approximation, In Advances in Neural Information Processing Systems ... booking to provid-ing information or keeping company and forminglong term relationships with the users. Other in- teresting types of DS are tutorial systems, whosegoal is to teach something ... simulates interactions. Learning or model-free algorithms only use training examples fromprevious interactions with the environment andthat is the main difference of these two categories,according...