Machine Learning and Robot Perception - Bruno Apolloni et al (Eds) Part 15 pps

8 Cognitive User Modeling Computed by a Proposed Dialogue Strategy 345 Therefore, the prospective dialogue strategy taken by player1 is computed by taking into account the following probabilistic noise as follows: ),0( ˆˆ G Nbb ki  (15) where, i b ˆ can be obtained by shifting the original pay-off matrix in Eq.(6). We suggest that the addictive noise effect to the pay-off may play a crucial role in the stability of dialogue strategies, and prevents in particular from having to use dead reckoning. This could also be reminiscent of a regularization effect in machine learning theory. In practice, the regularization has been applied in wide range of sensory technology. In our case, the proposed dialogue strategy incorporating Eq.(15) in our dialogue strategy is capable of having real world competence. This may be true for intelli- gent sensory technology, for instance, proposed by (Koshizen, 2002). Such a technology learns cross-correlation among different sensors - selecting sensors that can be the best ones for minimizing a predictive localization error of a robot, by modeling the uncertainty (Koshizen, 2001). That means probabilistic models, computed by sonar and infrared sensors, were employed to estimate each robot's location. Figures 8.6–8.7 describes the computational aspect, resulting from several simulations considering the proposed dialogue strategy 'type 2' impli- cates when player choose their dialogue actions statistically, subject to the approximation of true pay-off matrix. Altogether, the players interacted 5000 times. Interactions were made of 100 sets, and each set consisted of 50 steps. The initial value of possible numbers of the pay-off matrix was 1000 points. All components of the pay-off matrix were normalized. The plotted points represent dialogue actions, which were taken by player1 during their interactions. The rule of their interactions was assumed to follow the context of the modified IPD game. As a result, the actual pay-off matrix of player2 was cooperative, so a pay-off matrix was approximated by inhibiting anticooperative actions during the interactions. Figures 8.8–8.9 illustrates the Total Squared Error (TSE), which corresponds to the Euclid distance between a true pay-off matrix and an approximated pay-off matrix. The TSE was calculated during the IPD game between the two players. In our computation, the TSE is given by the following equation. ¦  4 1 2 ) ˆ ( i ii bbTSE (16) 346 H. Asai et al. Furthermore, let us penalize the TSE shown in Eq.(16). That is, )() ˆ ( 4 1 2* fbbTSE i ii : ¦ O (17) where, )( f: denotes the smoothness function, normally called regularization term or regularizer (Tikonov, 1963). In machine learning theory, it is known that the regularizer )( f: represents the complexity term. It can also express either the smoothness of the approximated function f given by x. The regularizer has been applied into real world application (Vauhkonen, 1998)(Koshizen and Rosseel, 2001). A crucial difference when Fig.8 compares Fig.9 is, the size of variance before the dialogue strategies (type1 or/and type2) are undertaken. Furthermore, the second term of Eq.(17) corresponds to the smoothness of (probabilistic) generative models, which are obtained by a learning scheme. The models can be used for selective purposes in order to acquire the unique model that fits the 'true' density function best. Therefore, the result from the learning scheme can be further minimized. Generally, the process is called model selection. Theoretically, results brought by Eq.(17) are closely related to Eq.(16)(ex., Hamza, 2002). In our case, * TSE is not calculated by the regularizer )( f: explicitly, though it is implicitly brought by the actual calculation according to Eq.(17). We attempt to enhance our proposed computation by taking into account Eq.(17). That is, the dialogue strategy must be able to reduce the uncertainty of other's pay-offs. In practice, players inquire about their own pay- offs explicitly. The inquiry reducing the uncertainty of player2's pay-off B ˆ corresponds to )( f: , which considers the past experiences of their inquiries. Additionally, O may provide self-control of interactive dialogue to the inquiries. In fact, many inquiries would sometimes be regarded as troublesome to others. Therefore, self-control will be needed. During dialogue interaction, the inquiry by each player is performed in our dialogue strategy, in order to reduce the uncertainty of a true pay-off of player2. The uncertainty is also modeled by probabilistic density functions. Thus, we expect that our proposed computation shown in Eq.(17) is capable of providing a minimization better than TSE in Eq.(15), i.e., TSETSE  * . Fig. 4 to 9 represents several computational results, which were obtained by the original model and the improved model. The biggest difference between the original and the improved models was that the ap- proximatedpay-offfunctionfinvolved a probabilistic property. It certainly 8 Cognitive User Modeling Computed by a Proposed Dialogue Strategy 347 affects the dialogue strategy, which is capable of making generative models smooth by reducing the uncertainty. In order to be effective, a long- lasting interaction between the two players must be ensured, as described before. The probabilistic property can cope with fluctuations of a pay-off in others. This can often resolve a problem where there is no longer a unique best strategy such as in the IPD game. The initial variances created by each action, are relatively large (Figs. 8.4 and 8.6) whereas in Fig. 8.5 and 8.7 they are smaller. In these figures, + denotes a true pay-off's value and (•) denotes an approximated value calculated by a pay-off function. Since Figs. 8.6 and 8.7 were obtained by computational results from the probabilistic pay-off function, the approximated values could be close to the true value. The inverse can also be true as shown in Fig.8.4 and 8.5. Additionally, Figs .8.8 and 8.9 illustrate TSE for each case represented in Figs 8.4, 8.5 and Figs 8.6, 8.7. The final TSE is 0.650999 (Fig.8.8: left), 0.011754 (Fig. 8.8: right), 0.0000161 (Fig. 8.9: left) and 0.000141 (Fig. 8.9: right) respectively. From all the results, one can see that the computation involving a probabilistic pay-off function showed better performances with respect to TSE because it avoided the dead-reckoning problem across the pay-off in others. Pattern Regression User Modeling Pattern Classification User Classification Discriminant Function Pay-off Function Mean Squared Error for Discriminant Function Approximation (Standard Expectation Maximization) Total Squared Error for Pay- off Function Approximation (Mutual Expectation Maximization) Regularization to parameterize Degree of generalization Regularization to parameterize Degree of Satisfaction Pattern Regression User Modeling Pattern Classification User Classification Discriminant Function Pay-off Function Mean Squared Error for Discriminant Function Approximation (Standard Expectation Maximization) Total Squared Error for Pay- off Function Approximation (Mutual Expectation Maximization) Regularization to parameterize Degree of generalization Regularization to parameterize Degree of Satisfaction Fig. 8.10. Analogous correspondences between pattern regression and user modeling Figure. 8.10 shows analogous correspondences between pattern regression and user modeling. From the figure, we can clearly see a lot of struc- tural similarities for each element such as classification, functional ap- 348 H. Asai et al. proxima-tion, and regularization. We can also see cross-correlations between pattern regression and user modeling. 8.6 Conclusion and Future Work In this paper, we theoretically presented a computational method for user modeling (UM) that can be used for estimating pay-offs of a user. Our proposed system allows a pay-off matrix in others to be approximated based on inductive game theory. This means that behavioral examples in others need to be employed with the pay-off approximation. The inductive game theory involves social cooperative issues, which take into account a dialogue strategy in terms of maximizing the pay-off function. We re- minded that the dialogue strategy had to bring into play long-lasting interactions with each other, so the approximating pay-off matrix could be used for estimating pay-offs in others. This makes a substructure of inducing social cooperative behavior, which leads to the maximum reciprocal expectation thereof. In our work, we provided a computation model of the social cooperation mechanism, using inductive game theory. In the theory, predictive dialogue strategies were assumed by implementation based on behavioral de- cisions taken by others. Additionally, induction is taken as a general principle for the cognitive process of each individual. The first simulation was carried out using the IPD game to minimize a total squared error (TSE), which was calculated by both a true and an approximated pay-off matrix. It is noted that minimizing the TSE can essen- tially be identical to maximizing expectation of a pay-off matrix. In the simulation, inquiring about pay-offs in others was considered as a computational aspect of the dialogue strategy. Then, the second simulation, in which a pay-off matrix can be approximated by a probability distribution function (PDF), was undertaken. Since we assumed that the pay-off matrix could fluctuate over time, a probabilistic form of pay-off matrix would be suitable to deal with the uncertainty. Consequently, the result, obtained by the second simulation (Section 8.5.2) provided better performances because of escaping from the dead-reckoning problem of a pay-off matrix. Moreover, we also pointed out the significance to introduce how the probabilistic pay-off function could cope with a real world competence when the behavioral analysis was used to model pay-offs in others. In principle, the behavioral analysis can be calculated by sensory technology based on vision and audition. Additionally, there would be no sense the pay-off matrix to change in daily communication. This means that sensing 8 Cognitive User Modeling Computed by a Proposed Dialogue Strategy 349 technology has to come up with a way to reduce the uncertainty of some- one's pay-offs. Consequently, this could lead to approximating pay-off distribution function accurately. Furthermore, in the second simulation, we pointed out that the proposed dialogue strategy could play a role in refining the estimated pay-off function. This is reminiscent of model selection problem in machine learning theory. In the theory, (probabilistic) generative models are selectively eliminated in terms of generalization perform- ance. Our dialogue strategy brings into play the on-line maintenance of user models. That is, the dialogue strategy, leads to a long-lasting interaction, which allowed user models to be selected, in terms of approximating a true pay-off's density function. More specifically, the dialogue strategy would allow inquiry to reduce the uncertainty of pay-offs in others. The timing and content quality when inquiring to others, should also be noted as being a human-like dialogue strategy involving cognitive capabilities. Our study has shown that inductive game theory effectively provides a theoretical motivation to elicit the proposed dialogue strategy, which is feasible with maximum mutual expectation and uncertainty reduction. Nevertheless, substantial studies will still require establishing our algorithm by considering with the inductive game theory. Another future extension of this work could be applied to our proposed computation with humanoid robot applications, allowing humanoid robots to be able to carry out reciprocal interactions with humans. Our computation of UM suggests that users who resist the temptation to defect for short-term gain and instead persist in mutual cooperation between robots and humans. A long-lasting interaction will thus require other's pay-off's estimations. Importantly, the long-lasting interaction could also be used to evaluate how much the robots gain the satisfaction of humans. We are convinced that this could be one of the most faithful aspects particularly when humanoid robots are considered with man-machine interactions. Consequently, our work provides a new scheme of man-machine interaction, which is computed by maximizing a mutual expectation of pay-off functions in others. 350 H. Asai et al. References 1 Newell, A. Unified theories of cognition, Cambridge, MA:Harvard University Press, 1983. 2 Newell, A. Unified theories of cognition, Cambridge, MA:Harvard University Press, 1983. 3 Fischer, G. User modeling in Human-Computer Interaction. User modeling and User-Adapted Interaction, 11:65 86, 2001. 4 Basu, C., Hirsh, H. and Cohen, W. Recommendation as Classification: Using and Content-Based Information in Recommendation. In: AAAI98-Proceedings of the Fifteenth National Conference on Artifi- cial Intelligence, Madison, Wisconsin, 714-720, 1998. 5 Gervasio, M., Iba, W. and Langley, P. Learning to Predict User Opera- tions for Adaptive Scheduling. In : AAAi98-Proceedings of the Fif- teenth National Conference on Artificial Intelligence, Madison, Wis- consin, 721-726, 2001. 6 Nash, J.F., Annals of Mathematics, 54:286-295, 1951. 7 Kaneko, M and Matsui, A., Inductive Game Theory: Discrimination and Prejudices. Journal of Public Economic Theory, Blackwell Pub- lishers, Inc., 1(1):101-137, 1999. 8 Axelrod, R. and Hamilton, W.D., The evolution of cooperation. Sci- ence, 211, 1390-1396, 1981. 9 Axelrod, R.M., The Evolution of Cooperation, New York: Basic Books, 1984. 10 Boyd, R., Is the repeated Prisoner's dilemma a good model of reciprocal altruism ?. Ethol. Sociobiol, Vol.9, 211-222, 1988. 11 Nesse, R.M., Evolutionary Explanations of Emotions, Human Nature, 1, 261-289, 1990. 12 Trivers R., Social Evolution, Menlo Part, CA: Cummings, 1985. 13 Webb, G.I., Pazzani, M.J. and Billsus, D., Machine Learning for User Modeling, User Modeling and User-Adapted Interaction, 11(1-2), 19- 29. 14 Valiant L.G., A Theory of The Learnable Communications of the ACM, 27, 1134-1142, 1984. 15 Debreu, G., A Continuity properties of paretian utility. International Economic Review, Vol.5, pp.285-293, 1964. 16 Baker, F. and Rachlin, H. Probability of reciprocation in repeated prisoner's dilemma games. Journal of Behavioral Decision Making, 14, 51-67, John Wiley & Sons, Ltd. 8 Cognitive User Modeling Computed by a Proposed Dialogue Strategy 351 17 Andre, E. Rist, T., Mueller, J., Integrating Reactive and Scripted Be- haviors in a Life-Llike Presentation Agent. Proc. Int. Conf. Autono- mous Agents, 261-268. 18 Noma, T., Zhao, L. and Badler, N.I. Design of a Virtual Human Pre- senter, IEEE Computer Graphics and Applications, 20(4), 79-85, 2000. 19 Legerstee, M., Barna, J. and DiAdamo, C., Precursors to the develop- ment at intention of 6 months: Understanding people and their actions, Developmental Psychology, 36(5), 627-634. 20 Lieberman, H., Letizia:An Agent that Assists Web Browsing. In:IJCAI95-Proceedings of the Fourteenth International Joint Confer- ence on Artificial Intelligence, Montreal, Canada 924-929. 21 Dempster, A.P., Laird, N.M. and Rubin, D.B., Maximum Likelihood from Incomplete Data via the EM Algorithm, J.Roy.Statist.Soc.Ser, B39, 1-38, 1977. 22 Koshizen,T., Improved Sensor Selection Technique by Integrating, J. of Intel. and Robo. Syst., 2001. 23 Koshizen, T., Yamada, S. and Tsujino, H., Semantic Rewiring Mecha- nism of Neural Cross-supramodal Integration based on Spatial and Temporal Properties of Attention. Neurocomputing, 52-54, 643-648, 2003. 24 Tikhonov, A.N., On Solving ill-posed problem and method of regularization. Doklady Akademii Nawk USSR, 153, 501-504, 1963. 25 Vauhkonen, M., Vadasz, D. and Kaipio, J.P., Tiknonov Regularization and Prior Information in Electrical Impedance Tomography. IEEE Transaction on Madical Imaging, 17, 2, 285-293,2001. 26 Koshizen, T. and Rosseel, Y. A New EM Algorithm using the Tikonov Regularizer and the GMB-REM Robot's Position Estimation System. Int. J. of Knowle. Intel. Engi. Syst., 5, 2-14, 2001. 27 Hamza, A.B., Krim, H. and Unal G.B., Unifying probabilistic and variational estimation. IEEE Signal Processing Magazine, 37-47,2002. . lot of struc- tural similarities for each element such as classification, functional ap- 348 H. Asai et al. proxima-tion, and regularization. We can also see cross-correlations between pattern. (Figs. 8.4 and 8.6) whereas in Fig. 8.5 and 8.7 they are smaller. In these figures, + denotes a true pay-off's value and (•) denotes an approximated value calculated by a pay-off function made of 100 sets, and each set consisted of 50 steps. The initial value of possible numbers of the pay-off matrix was 1000 points. All components of the pay-off matrix were normalized. The plotted

Machine Learning and Robot Perception - Bruno Apolloni et al (Eds) Part 15 pps

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan