Hierarchical organization of consumer reviews for products and its applications

HIERARCHICAL ORGANIZATION OF CONSUMER REVIEWS FOR PRODUCTS AND ITS APPLICATIONS YU JIANXING A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY SCHOOL OF COMPUTING NATIONAL UNIVERSITY OF SINGAPORE 2012 c ⃝2012 YU JIANXING Acknowledgements I would like to express my gratitude to all those who contributed and extended their valuable assistance to help me prepare and complete this thesis. My deepest gratitude goes first and foremost to my advisor, Prof. Chua Tat-Seng, who led me through the four years of Ph.D study and research. His perpetual enthusiasm, valuable insight, and unconventional vision in research had consistently motivated me to explore my work in the topic of sentiment analysis. I am deeply grateful for his thoughtful, patient, and kind guidance during the graduate training. To me, Prof. Chua is not only an academic advisor, but also a role model and a lifetime mentor. His valuable advice adds considerably to my graduate experience, and his influence has been undoubtedly beyond the research aspect of my life. Besides my advisor, I wish to express my sincerest gratitude to my thesis committee, including Prof. Ng Hwee Tou, Prof. Tan Chew Lim and external examiners, for their critical readings and constructive criticisms, which make the thesis as sound as possible. I greatly benefit from their encouragements, brilliant ideas and high standard questions. It is an incredible honor to be examined by such knowledgeable people. Very special thanks go to Dr. Zha Zheng-Jun, for his instructive guidance, insightful criticism and inspiring questions. Dr. Zha had spent much time discussing the research topics with me and helped me go through many obstacles. Also, I would like to thank all my labmates in Lab for Media Search (LMS) for their stimulating discussions and enlightening suggestions on my work. I extend my thanks to Loo Line Fong, for her always kind help in coordinating all administrative stuffs in my four years in the school of computing. Moreover, I must acknowledge National University of Singapore and School of Computing for their technical and financial support. Last but not least, my gratitude would go to my family and my friends, especially Guo iii Jiayan, for their consistent supports and sincere helps throughout my life. Without them, this thesis would not be possible. My gratitude towards them is truly beyond words. iv Table of Contents Acknowledgements iii Abstract ix List of Figures xii List of Tables xv Chapter Introduction 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Guide to This thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter Literature Review 11 2.1 Overview of Research Topics in Sentiment Analysis . . . . . . . . . . . 11 2.2 Generation of Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.1 Product Aspect Identification . . . . . . . . . . . . . . . . . . . 15 2.2.2 Sentiment Classification on Product Aspects . . . . . . . . . . 16 2.2.3 Acquisition of Parent-child Relations . . . . . . . . . . . . . . 17 v 2.3 2.4 2.2.3.1 Pattern-based Approach . . . . . . . . . . . . . . . . 17 2.2.3.2 Clustering-based Approach . . . . . . . . . . . . . . 20 Product Aspect Ranking . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.1 Related Work on Ranking of Reviews . . . . . . . . . . . . . . 24 2.3.2 Document-level Sentiment Classification . . . . . . . . . . . . 25 2.3.3 Extractive Review Summarization . . . . . . . . . . . . . . . . 25 Question Answering (QA) . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4.1 Traditional QA . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4.2 Opinion QA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.4.2.1 Question Analysis and Answer Fragment Retrieval . . 28 2.4.2.2 Answer Generation . . . . . . . . . . . . . . . . . . 29 Chapter Hierarchical Organization of Consumer Reviews for Products 31 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 Hierarchical Organization Framework . . . . . . . . . . . . . . . . . . 35 3.2.1 Preliminary and Notations . . . . . . . . . . . . . . . . . . . . 36 3.2.2 Initial Hierarchy Acquisition . . . . . . . . . . . . . . . . . . . 37 3.2.3 Product Aspect Identification . . . . . . . . . . . . . . . . . . . 37 3.2.4 Generation of Aspect Hierarchy . . . . . . . . . . . . . . . . . 41 3.2.4.1 Formulation . . . . . . . . . . . . . . . . . . . . . . 41 3.2.4.2 Linguistic Features for Semantic Distance Estimation 44 3.2.4.3 Estimation of Semantic Distance . . . . . . . . . . . 46 Sentiment Classification on Product Aspects . . . . . . . . . . 48 Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3.1 Data Set and Experimental Settings . . . . . . . . . . . . . . . 50 3.3.2 Evaluations on Product Aspect Identification of Free Text Reviews 52 3.3.3 Evaluations on Generation of Aspect Hierarchy . . . . . . . . . 53 3.3.3.1 53 3.2.5 3.3 Comparisons to the State-of-the-Art Methods . . . . vi 3.3.3.2 Evaluations on the Effectiveness of the Initial Hierarchy 55 3.3.3.3 Evaluations on the Effectiveness of Optimization Criteria 56 3.3.3.4 Evaluations on Semantic Distance Learning . . . . . 57 Evaluations on Aspect-level Sentiment Classification . . . . . . 59 Sub-tasks Reinforced by the Hierarchy . . . . . . . . . . . . . . . . . . 61 3.4.1 Product Aspect Identification with the Hierarchy . . . . . . . . 61 3.4.2 Sentiment Classification on Aspects using the Hierarchy . . . . 65 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.3.4 3.4 3.5 Chapter Product Aspect Ranking 69 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.2 Product Aspect Ranking Framework . . . . . . . . . . . . . . . . . . . 72 4.2.1 Notations and Problem Formulation . . . . . . . . . . . . . . . 72 4.2.2 Aspect Ranking Algorithm . . . . . . . . . . . . . . . . . . . . 73 Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.3.1 Data Set and Experimental Settings . . . . . . . . . . . . . . . 76 4.3.2 Evaluations on Aspect Ranking . . . . . . . . . . . . . . . . . 77 Tasks Supported by Aspect Ranking . . . . . . . . . . . . . . . . . . . 81 4.4.1 Document-level Sentiment Classification . . . . . . . . . . . . 82 4.4.2 Extractive Review Summarization . . . . . . . . . . . . . . . . 85 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.3 4.4 4.5 Chapter Opinion Question Answering on Products 93 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.2 Question Analysis and Answer Fragment Retrieval . . . . . . . . . . . 96 5.3 Answer Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.3.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.3.2 Salience Weight Estimation . . . . . . . . . . . . . . . . . . . 102 vii 5.3.3 5.4 5.5 Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.4.1 Data Set and Experimental Settings . . . . . . . . . . . . . . . 104 5.4.2 Evaluations on Question Analysis . . . . . . . . . . . . . . . . 105 5.4.3 Evaluations on Answer Generation . . . . . . . . . . . . . . . 107 5.4.3.1 Comparisons to the State-of-the-Art Methods . . . . 107 5.4.3.2 Evaluations on the Effectiveness of Multiple Criteria . 109 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Chapter 6.1 Coherence Weight Estimation . . . . . . . . . . . . . . . . . . 103 Conclusions 111 Research Summary and Significance . . . . . . . . . . . . . . . . . . . 112 6.1.1 Hierarchical Organization of Consumer Reviews . . . . . . . . 112 6.1.2 Product Aspect Ranking . . . . . . . . . . . . . . . . . . . . . 113 6.1.3 Opinion-QA on Products . . . . . . . . . . . . . . . . . . . . . 114 6.2 Limitations of This Work . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.3 Directions for Future Research . . . . . . . . . . . . . . . . . . . . . . 116 Bibliography 119 Publications 140 viii Abstract Huge collections of consumer reviews for products are now available on the Web. These reviews contain rich opinionated information on various products. They have become a valuable resource to facilitate consumers in understanding the products prior to making purchasing decisions, and support manufacturers in comprehending consumer opinions to effectively improve the product offerings. However, such reviews are often unorganized, leading to difficulty in information navigation and knowledge acquisition. It is inefficient for users to gather public opinions on a product by reading through all the consumer reviews and manually analyzing opinions on each review. To address the problem, this thesis focuses on discovering the natural structure inherent within the consumer reviews and organizing them accordingly. Since hierarchy can usually improve information dissemination and accessibility, we propose a domain-assisted approach to generate a hierarchical structure for organizing consumer reviews of products. The hierarchy is generated by simultaneously exploiting domain knowledge (e.g., the product specifications) and consumer reviews. It is a tree structure which organizes product aspects as nodes following their parent-child relations. The aspect refers to a component or an attribute of a certain product. For each aspect, the reviews and the corresponding opinions on this aspect are stored. Such hierarchy provides a well-visualized way to browse consumer reviews at different levels of granularity to meet various users’ information needs. With the hierarchy, users can easily grasp the overview of consumer reviews and conveniently seek the desired information, such as the product aspects and consumer opinions. We conduct experiments on 11 popular products in four domains. There are 70,359 consumer reviews on these products totally. This product review dataset has been released for future research. The experimental results demonstrate the effectiveness of the proposed approach. We further experimentally show that the generated hierarchy can reinforce the sub-tasks of product aspect identification ix and sentiment classification on aspects. The generated hierarchy can be used to support a wide range of tasks. In this thesis, we investigate its usefulness in supporting two tasks, i.e. product aspect ranking that aims to automatically identify important product aspects from consumer reviews, and opinion Question Answering (opinion-QA) on products which tries to generate appropriate answers for the opinionated questions about products. In particular, product aspect ranking identifies the important aspects according to two observations: (a) the important aspects of a product are usually commented by a large number of consumers; and (b) consumer opinions on the important aspects greatly influence their overall opinions on the product. Given the review hierarchy of a certain product, we develop an aspect ranking algorithm to identify the important aspects by simultaneously considering the aspect frequency and the influence of consumer opinions given to each aspect over their overall opinions. The experimental results on product review dataset illustrate the efficacy of the proposed aspect ranking approach. Furthermore, we leverage aspect ranking to support the sub-tasks of document-level sentiment classification and extractive review summarization. Significant performance improvements are achieved on these two sub-tasks. Additionally, we develop a new product opinion-QA framework with the help of the hierarchy, which enables accurate question analysis and effective answer generation. Specifically, we first identify the (explicit/implicit) product aspects asked in the questions and their sub-aspects by referring to the hierarchy. The corresponding review fragments relevant to the aspects are then retrieved from the hierarchy. In order to generate the appropriate answers from review fragments, we develop a multi-criteria optimization answer generation approach which simultaneously takes into account review salience, coherence, diversity, and parent-child relations among the aspects. Evaluations are conducted on the product review dataset using 220 questions on the products. Significant performance improvements have been obtained, which demonstrate the effectiveness of x 125 [44] V. Gupta and G.-S. Lehal. A survey of text summarization extractive techniques. In Journal of Emerging Technologies in Web Intelligence, Volume 2, pp. 258-268, 2010. [45] Z. Harris. Distributional structure. In Word, pp. 146-162, 1954. [46] J. He and D. Dai. Summarization of yes/no questions using a feature function model. In Journal of Machine Learning Research, Volume 20, pp. 351-366, 2011. [47] M.-A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics (COLING), pp. 539-545. Nantes, France, 1992. [48] E. Hoerl and R. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. In Journal of Technometrics, Volume 12, pp. 80-86, 1970. [49] E. Hovy, L. Gerber, U. Hermjakob, C. Lin, and D. Ravichandran. Toward semantics based answer pinpointing. In Proceedings of the First International Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT), pp. 1-7, 2001. [50] E. Hovy and C.-Y. Lin. Automated text summarization in summarist. In Advances in Automatic Text Summarization, MIT Press, pp. 18-24, 1999. [51] M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD international conference on Knowledge Discovery and Data mining, pp. 168-177. Seattle, WA, USA, 2004. [52] K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir techniques. In Transactions on Information Systems (TOIS), Volume 20, pp. 422-446, 2002. [53] J. Jeon, W.-B. Croft, and J.-H. Lee. Finding semantically similar questions based on their answers. In Proceedings of the 28th annual international ACM SIGIR 126 conference on Research and development in Information Retrieval (SIGIR), pp. 617-618, Salvador, Brazil, 2005. [54] J. Jeon, W.-B. Croft, and J.-H. Lee. Finding similar questions in large question and answer archives. In Proceeding of the 14th ACM conference on Information and Knowledge Management (CIKM), pp. 84-90, Bremen, Germany, 2005. [55] J. Jeon, W.-B. Croft, J.-H. Lee, and S. Park. A framework to predict the quality of answers with non-textual features. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR), pp. 228-235, Seattle, USA, 2006. [56] P. Jiang, H. Fu, C. Zhang, and Z. Niu. A framework for opinion question answering. In Advanced Information Management and Service (IMS), pp. 424-427, Seoul, Korea, 2010. [57] W. Jin and H.-H. Ho. A novel lexicalized hmm-based learning framework for web opinion mining. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML), pp. 465-472, Montreal, Canada, 2009. [58] N. Jindal and B. Liu. Identifying comparative sentences in text documents. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR), pp. 244-251, Seattle, USA, 2006. [59] N. Jindal and B. Liu. Opinion spam and analysis. In Proceedings of First ACM International Conference on Web Search and Data Mining (WSDM), pp. 219-230, Stanford, CA, USA, 2008. [60] A. Kannan, I. Givoni, R. Agrawal, and A. Fuxman. Matching unstructured offers to structured product descriptions. In Proceedings of the 17th ACM SIGKDD 127 international conference on Knowledge Discovery and Data mining, pp. 404-412. San Diego, California, USA, 2011. [61] H.-D. Kim, K.-A. Ganesan, P. Sondhi, and C.-X. Zhai. Comprehensive review of opinion summarization. In UIUC Technical Report, 2011. [62] A.-C. Konig and E. Brill. Reducing the human overhead in text categorization. In Proceedings of the 12nd ACM SIGKDD international conference on Knowledge Discovery and Data mining (SIGKDD), pp. 598-603, Philadelphia, USA, 2006. [63] Z. Kozareva, E. Riloff, and E. Hovy. Semantic class learning from the web with hyponym pattern linkage graphs. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 1048-1056, Columbus, Ohio, USA, 2008. [64] L.-W. Ku, Y.-T. Liang, and H.-H. Chen. Opinion extraction, summarization and tracking in news and blog corpora. In Proceedings of the 21st national conference on Artificial intelligence (AAAI), pp. 100-107, Boston, Massachusetts, USA, 2006. [65] L.-W. Ku, Y.-T. Liang, and H.-H. Chen. Question analysis and answer passage retrieval for opinion question answering systems. In International Journal of Computational Linguistics Chinese Language Processing, 2007. [66] S. Kullback. On information and sufficiency. In Annals of Mathematical Statistics, Volume 22, pp. 79-86, 1951. [67] M. Lapata. Probabilistic text structuring: Experiments with sentence ordering. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 545-552, Sapporo, Japan, 2003. 128 [68] M. Lapata. Automatic evaluation of information ordering: Kendall´s tau. In Journal of Computational Linguistics, Volume 32, pp. 471-484, 2006. [69] F. Li, C. Han, M. Huang, X. Zhu, Y.-J. Xia, S. Zhang, and H. Yu. Structure-aware review mining and summarization. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING), pp. 653-661, Beijing, China, 2010. [70] F. Li, Y. T. M. Huang, and X. Zhu. Answering opinion questions with random walks on graphs. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (ACL/AFNLP), pp. 737-745, Singapore, 2009. [71] T. Li, Y. Zhang, and V. Sindhwani. A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 244-252, Singapore, 2009. [72] C.-Y. Lin and E. Hovy. Automatic evaluation of summaries using n-gram cooccurrence statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language (HLT-NAACL), pp. 71-78. Edmonton, Canada, 2003. [73] D. Lin. Automatic retrieval and clustering of similar words. In Proceedings of the 17th International Conference on Computational Linguistics (COLING), pp. 768-774. Montreal, Quebec, Canada, 1998. [74] D. Lin. Dependency-based evaluation of minipar. In Workshop on the Evaluation of Parsing Systems, pp. 317-329, 1998. [75] B. Liu. Handbook chapter: Sentiment analysis and subjectivity. handbook of natural language processing. In Marcel Dekker, Inc. New York, NY, USA, 2009. 129 [76] B. Liu, M. Hu, and J. Cheng. Opinion observer: Analyzing and comparing opinions on the web. In Proceedings of the 14th international conference on World Wide Web (WWW), pp. 342-351. Chiba, Japan, 2005. [77] Y. Liu, J. Bian, and E. Agichtein. Predicting information seeker satisfaction in community question answering. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR), pp. 483-490. Singapore, 2008. [78] Y. Liu, X. Huang, A. An, and X. Yu. Modeling and predicting the helpfulness of online reviews. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining (ICDM), pp. 443-452. Pisa, Italy, 2008. [79] E. Lloret, A. Balahur, M. Palomar, and A. Montoyo. Towards a unified approach for opinion question answering and summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 168-174. Portland, Oregon, USA, 2011. [80] Y. Lu, P. Tsaparas, A. Ntoulas, and L. Polanyi. Exploiting social context for review quality prediction. In Proceedings of the 19th international conference on World Wide Web (WWW), pp. 691-700. Raleigh, North Carolina, USA, 2010. [81] J.-B. MacQueen. Some methods for classification and analysis of multivariate observations. In 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297, 1967. [82] L.-M. Manevitz and M. Yousef. One-class svms for document classification. In Journal of Machine Learning Research, Volume 2, pp. 139-154, 2002. [83] G.-S. Mann. Fine-grained proper noun ontologies for question answering. In Proceedings of the 2002 workshop on Building and using semantic networks, pp. 1-7, 2002. 130 [84] Q. Mei, X. Ling, M. Wondra, H. Su, and C.-X. Zhai. Topic sentiment mixture: Modeling facets and opinions in weblogs. In Proceedings of the 16th international conference on World Wide Web (WWW), pp. 171-180. Banff, Alberta, Canada, 2007. [85] R. Mihalcea, C. Banea, and J. Wiebe. Learning multilingual subjective language via cross-lingual projections. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 976-983, Prague, Czech Republic, 2007. [86] C.-D. Mizil, G. Kossinets, J. Kleinberg, and L. Lee. How opinions are received by online communities: A case study on amazon.com helpfulness votes. In Proceedings of the 18th international conference on World Wide Web (WWW), pp. 141-150, Madrid, Spain, 2009. [87] S. Moghaddam and M. Ester. Aqa: Aspect-based opinion question answering. In The 2011 IEEE International Conference on Data Mining (ICDM) Workshops, pp. 89-96, Vancouver, BC, Canada, 2011. [88] K. Murthy, T. Faruquie, L.-V. Subramaniam, K.-H. Prasad, and M. Mohania. Automatically generating term-frequency-induced taxonomies. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 126-131. Uppsala, Sweden, 2010. [89] R. Narayanan, B. Liu, and A. Choudhary. Sentiment analysis of conditional sentences. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP), pp. 180-189, Sapporo, Japan, 2009. [90] H. Nishikawa, T. Hasegawa, Y. Matsuo, and G. Kikui. Optimizing informativeness and readability for sentiment summarization. In Proceedings of the 48th Annual 131 Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 325-330. Uppsala, Sweden, 2010. [91] B. O’Connory, R. Balasubramanyan, B.-R. Routledge, and N.-A. Smithy. From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the 25th national conference on Artificial intelligence (AAAI), Washington, DC, 2010. [92] B. Ohana and B. Tierney. Sentiment classification of reviews using sentiwordnet. In Proceedings of the 9th IT&T Conference, Dublin, Ireland, 2009. [93] Y. Ouyang, W. Li, and Q. Lu. An integrated multi-document summarization approach based on word hierarchical representation. In Proceedings of the ACLIJCNLP 2009 Conference, pp. 113-116. Singapore, 2009. [94] G. Paltoglou and M. Thelwall. A study of information retrieval weighting schemes for sentiment analysis. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 1386-1395. Uppsala, Sweden, 2010. [95] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP), pp. 79-86. Philadelphia, USA, 2002. [96] P. Pantel and D. Lin. Discovering word senses from text. In Proceedings of the 8th ACM SIGKDD international conference on Knowledge Discovery and Data mining (SIGKDD), pp. 613-619, Edmonton, Alberta, Canada, 2002. [97] P. Pantel and M. Pennacchiotti. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of the 44th Annual Meet- 132 ing of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 113-120. Sydney, Australia, 2006. [98] P. Pantel, D. Ravichandran, and E. Hovy. Towards terascale knowledge acquisition. In Proceedings of the 20th International Conference on Computational Linguistics (COLING), pp. 771-777, Geneva, Switzerland, 2004. [99] A.-M. Popescu and O. Etzioni. Extracting product features and opinions from reviews. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT/EMNLP), pp. 339-346, Vancouver, B.C., Canada, 2005. [100] J.-M. Prager, D. Radev, E.-W. Brown, and A. Coden. The use of predictive annotation for question-answering in trec8. In Proceedings of the Tenth Text Retrieval Conference (TREC), pp. 309-316, Gaithersburg, Maryland, 2000. [101] Pricegrabber-Report. Comparison shopping beyond compare. In http://www. pricegrabber.com/, 2006. [102] D. Radev, H. Jing, and M. Budzikowska. Centroid-based summarization of multiple documents: Sentence extraction, utility-based evaluation, and user studies. In Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization, pp. 21-30, 2000. [103] D. Radev, S. Teufel, H. Saggion, and W. Lam. Evaluation challenges in large-scale multi-document summarization. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 375-382. Sapporo, Japan, 2003. [104] S. Riezler, E. Vasserman, I. Tsochantaridis, V. Mittal, and Y. Liu. Statistical machine translation for query expansion in answer retrieval. In Proceedings of the 133 45th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 464-471, Prague, Czech Republic, 2007. [105] B. Rosenfeld and R. Feldman. High-performance unsupervised relation extraction from large corpora. In Proceedings of the 2006 Eighth IEEE International Conference on Data Mining (ICDM), pp. 1032-1037, Hong Kong, China, 2006. [106] B. Rosenfeld and R. Feldman. Clustering for unsurpervised relation identification. In Proceeding of the 16th ACM conference on Information and Knowledge Management (CIKM), pp. 411-418, Lisbon, Portugal, 2007. [107] M. Sanderson and B. Croft. Document-word co-regularization for semi- supervised sentiment analysis. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining (ICDM), pp. 1025-1030. Pisa, Italy, 2008. [108] A. Schrijver. Theory of linear and integer programming. In John Wiley & Sons, Inc., 1998. [109] D.-W. Scott. Multivariate density estimation: Theory, practice, and visualization. In John Wiley & Sons, Inc., 1992. [110] B. Shi and K. Chang. Generating a concept hierarchy for sentiment analysis. In IEEE International Conference on Systems Man and Cybernetics, pp. 312-317, 2008. [111] C. Silla and A. Freitas. A survey of hierarchical classification across different application domains. In Data Mining and Knowledge Discovery, Volume 22, pp. 31-72, 2011. [112] R. Snow and D. Jurafsky. Semantic taxonomy induction from heterogenous evidence. In Proceedings of the 44th Annual Meeting of the Association for Com- 134 putational Linguistics on Computational Linguistics (ACL), pp. 801-808. Sydney, Australia, 2006. [113] B. Snyder and R. Barzilay. Multiple aspect ranking using the good grief algorithm. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT), pp. 300-307. Rochester, New York, USA, 2007. [114] S. Somasundaran, T. Wilson, J. Wiebe, and V. Stoyanov. Qa with attitude: Exploiting opinion type analysis for improving question answering in online discussions and the news. In Proceedings of the Conference on Weblogs and Social (ICWSM), Boulder, Colorado, USA, 2007. [115] V. Stoyanov, C. Cardie, and J. Wiebe. Multi-perspective question answering using the opqa corpus. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP), pp. 923-930. Vancouver, B.C., Canada, 2005. [116] Q. Su, X. Xu, H. Guo, X. Wu, X. Zhang, B. Swen, and Z. Su. Hidden sentiment association in chinese web opinion mining. In Proceedings of the 17th international conference on World Wide Web (WWW), pp. 959-968. Beijing, China, 2008. [117] R. Sun, H. Cui, K. Li, M.-Y. Kan, and T.-S. Chua. Dependency relation matching for answer selection. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR), pp. 651-652, Salvador, Bahia, Brazil, 2005. [118] R. Sun, C. Ong, and T.-S. Chua. Mining dependency relations for query expansion in passage retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR), pp. 382-389, Seattle, USA, 2006. 135 [119] J. Tatemura. Virtual reviewers for collaborative exploration of movie reviews. In Proceedings of the 5th international conference on Intelligent User Interfaces (IUI), pp. 272-275, New Orleans, LA, USA, 2000. [120] S. Tellex, K. Boris, L. Jimmy, F. Aaron, and M. Gregory. Quantitative evaluation of passage retrieval algorithms for question answering. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR), pp. 41-47, Toronto, Canada, 2003. [121] P. Tsaparas, A. Ntoulas, and E. Terzi. Selecting a comprehensive set of reviews. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge Discovery and Data mining (SIGKDD), pp. 783-792. Washington, DC, USA, 2011. [122] P. Turney. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 417-424, Philadelphia, PA, USA, 2002. [123] E. Voorhees. Overview of the trec 2001 question answering track. In Proceedings of the Tenth Text Retrieval Conference (TREC), pp. 42-51, Gaithersburg, Maryland, 2001. [124] E. Voorhees. Overview of the trec 2002 question answering track. In Proceedings of the Tenth Text Retrieval Conference (TREC), pp. 115-123, Gaithersburg, Maryland, 2002. [125] E. Voorhees. Evaluating answers to definition questions. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL), pp. 109-111, Edmonton, Canada, 2003. 136 [126] E. Voorhees and D. Harman. Overview of the eighth text retrieval conference (trec-8). In Proceedings of the Tenth Text Retrieval Conference (TREC), pp. 1-24, Gaithersburg, Maryland, 2000. [127] X. Wan. Co-training for cross-lingual sentiment classification. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 235-243, Singapore, 2009. [128] D. Wang and Y. Liu. A pilot study of opinion summarization in conversations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 331-339, Portland, USA, 2011. [129] H. Wang, Y. Lu, and C.-X. Zhai. Latent aspect rating analysis on review text data: A rating regression approach. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge Discovery and Data mining, pp. 168-176. San Diego, California, USA, 2010. [130] W. Wei and J. Gulla. Sentiment learning on product reviews via sentiment ontology tree. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 404-413, Uppsala, Sweden, 2010. [131] S. Wells and K. Tarrant. Amazon fba recipe for success. In http://www. fbamoms.com/, 2011. [132] J. Wiebe, R.-F. Bruce, and T.-P. O’Hara. Development and use of a gold standard data set for subjectivity classifications. In Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 246-253, Maryland, USA, 1999. [133] J. Wiebe, T. Wilson, R. Bruce, M. Bell, and M. Martin. Learning subjective language. In Journal of Computational Linguistics, Volume 30, pp. 277-308, 2004. 137 [134] T. Wilson, J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT/EMNLP), pp. 347-354, Vancouver, B.C., Canada, 2005. [135] T.-L. Wong and W. Lam. Hot item mining and summarization from multiple auction web sites. In Proceedings of the 2005 Eighth IEEE International Conference on Data Mining (ICDM), pp. 797-800, Washington, DC, USA, 2005. [136] Y. Wu, Q. Zhang, X. Huang, and L. Wu. Phrase dependency parsing for opinion mining. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 1533-1541, Singapore, 2009. [137] J. Xu, A. Licuanan, and R. Weischedel. Trec2003 qa at bbn: Answering definitional questions. In Proceedings of the Tenth Text Retrieval Conference (TREC), pp. 98-106, Gaithersburg, Maryland, 2003. [138] X. Xue, J. Jeon, and W. Croft. Retrieval models for question and answer archives. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR), pp. 475-482, Singapore, 2008. [139] X. Xue, J. Jeon, and W.-B. Croft. A syntactic tree matching approach to finding similar questions in community-based qa services. In Proceedings of the 32nd annual international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR), pp. 187-194, Boston, Massachusetts, USA, 2009. [140] H. Yang and J. Callan. Near-duplicate detection by instance-level constrained clustering. In Proceedings of the 29th annual international ACM SIGIR confer- 138 ence on Research and development in Information Retrieval (SIGIR), pp. 421-428, Seattle, USA, 2006. [141] H. Yang and J. Callan. A metric-based framework for automatic taxonomy induction. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 271-279, Singapore, 2009. [142] H. Yang and J. Callan. Personalized concept hierarchy construction. In Ph.D. thesis, 2011. [143] S. Ye and T.-S. Chua. Learning object models from semi-structured web documents. In IEEE Transactions on Knowledge and Data Engineering (TKDE), Volume 18, pp. 334-349, 2006. [144] J. Yi, T. Nasukawa, W. Niblack, and R. Bunescu. Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques. In Proceedings of the 2003 Eighth IEEE International Conference on Data Mining (ICDM), pp. 427-434. Melbourne, Florida, USA, 2003. [145] L. Yi, B. Liu, and X. Li. Eliminating noisy information in web pages for data mining. In Proceedings of the 9th ACM SIGKDD international conference on Knowledge Discovery and Data mining (SIGKDD), pp. 296-305, Washington, D.C., USA, 2003. [146] H. Yu and V. Hatzivassiloglou. Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP), pp. 129-136. Sapporo, Japan, 2003. [147] J. Yu, Z.-J. Zha, M. Wang, and T.-S. Chua. Aspect ranking: Identifying important product aspects from online consumer reviews. In Proceedings of the 49th An- 139 nual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL), pp. 1496-1505, Portland, USA, 2011. [148] J. Yu, Z.-J. Zha, M. Wang, and T.-S. Chua. Domain-assisted product aspect hierarchy generation: Towards hierarchical organization of unstructured consumer reviews. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP), pp. 140-150, Edinburgh, UK, 2011. [149] J. Yu, Z.-J. Zha, M. Wang, and T.-S. Chua. Hierarchical organization of unstructured consumer reviews. In Proceedings of the 20th international conference on World Wide Web (WWW), pp. 171-172, India, 2011. [150] B. Zhang, H. Li, Y. Liu, L. Ji, W. Xi, W. Fan, Z. Chen, and W. Ma. Improving web search results using affinity graph. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR), pp. 504-511, Salvador, Brazil, 2005. [151] K. Zhang, R. Narayanan, and A. Choudhary. Voice of the customers: Mining online customer reviews for product feature-based ranking. In Proceedings of the 3rd conference on Online Social Networks, pp. 11-11, Boston, MA, USA, 2010. [152] Z. Zhang and B. Varadarajan. Utility scoring of product reviews. In Proceeding of the 15th ACM conference on Information and Knowledge Management (CIKM), pp. 51-57. Arlington, Virginia, USA, 2006. 140 Publications [P1] Chua Tat-Seng, Yu Jianxing, Zha Zheng-Jun, and Wang Meng. Domain-Assisted Hierarchical Organization of Product Consumer Reviews. US patent, Provisional Application No.: 61/622,970, filed at 11 April 2012. [P2] Chua Tat-Seng, Yu Jianxing, Zha Zheng-Jun, and Wang Meng. Product Aspect Ranking. US patent, Provisional Application No.: 61/622,972, filed at 11 April 2012. [B1] Yu Jianxing, Zha Zheng-Jun, and Chua Tat-Seng. Hierarchical Organization of Product Consumer Reviews. Chapter for the book ”The People’s Web Meets NLP: Collaboratively Constructed Language Resources”, pp. 209-252, Springer. 2012. [C4] Yu Jianxing, Zha Zheng-Jun, and Chua Tat-Seng. Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP’12, Oral), pp. 391-401, Jeju, Korea, July 12-14, 2012. [C3] Yu Jianxing, Zha Zheng-Jun, Wang Meng, Wang Kai and Chua Tat-Seng. DomainAssisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP’11, Oral), pp. 140-150, Edinburgh, UK, July 27-31, 2011. [C2] Yu Jianxing, Zha Zheng-Jun, Wang Meng, and Chua Tat-Seng. Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL’11, Oral), pp. 1496-1505, Portland, USA, June 19-24, 2011. [C1] Yu Jianxing, Zha Zheng-Jun, Wang Meng, and Chua Tat-Seng. Hierarchical Organization of Unstructured Consumer Reviews. In Proceedings of the 20th International World Wide Web Conference (WWW’11), pp. 171-172, India, Mar 28-Apr 1, 2011. [...]... browse consumer reviews at different levels of granularity to meet various users’ needs With the hierarchy, users can easily grasp the overview of consumer reviews and browse the desired information, such as product aspects and consumer opinions For example, users can find that 623 reviews, out of 9,245 reviews, are about the aspect “price”, with 241 positive and 382 negative reviews The hierarchical organization. .. There are often two kinds of information in the UGC, i.e the opinionated and factual information A process is needed to distinguish these two kinds of information Also, users are interested in various kinds of opinionated information on different UGC For example, they would concern product aspects for consumer reviews, and care opinion holders (i.e reviewers) or the hot events for news articles For this... the voice of the consumers from online reviews In addition, public opinions in the consumer reviews are all encoded in the hierarchy These opinions can be used to answer users’ opinionated questions about the products Opinionated questions often ask for consumers’ thinking and feeling on the products or aspects of products, such as “What’s everyone’s opinions on iPhone 4?” and the answer is formed by... acquisition It is impractical for users to grasp the overview of consumer reviews and opinions on various aspects of a product from such enormous reviews Among the hundreds of product aspects, it is also inefficient for users to browse consumer reviews and opinions on a certain aspect Thus, there is a compelling need to discover the structure within the consumer reviews and organize them accordingly,... important resource for both consumers and firms Consumers commonly seek quality information from online reviews prior to purchasing products, while many firms use online reviews as useful 3 feedbacks in their product development, marketing, and consumer relationship management 1.2 Motivation However, these numerous reviews are often unorganized, leading to the difficulty in information navigation and knowledge... platform for consumers to post reviews on millions of products For example, the forum CNet.com involves more 1 www.bing.com/shopping 2 Figure 1.1: Sample consumer reviews on website CNet.com than seven million product reviews [22]; whereas Pricegrabber.com contains millions of reviews on more than 32 million products in 20 distinct product categories over 11,000 merchants [101] Such numerous consumer reviews. .. contributions of this thesis are as follows: Hierarchical Organization of Consumer Reviews We propose a framework to generate a hierarchical structure to organize consumer reviews, so as to facilitate users in understanding the knowledge inherent within the reviews Moreover, we develop a domainassisted approach to generate the review hierarchy by exploiting domain knowledge and consumer reviews The generated... limitations of the work and possible directions for future research are demonstrated 11 Chapter 2 Literature Review This chapter reviews the related work to this thesis We first give an overview of current research topics on sentiment analysis We then illustrate the work related to three topics: (a) hierarchical organization of consumer reviews for products; (b) product aspect ranking; and (c) opinion-QA on products, ... understanding the knowledge inherent within the reviews Since the hierarchy can improve information dissemination and accessibility [20], we propose to generate a hierarchical structure to organize consumer reviews Figure 1.2 illustrates a sample of hierarchical organization for product iPhone 3G The hierarchy not only organizes all the product aspects and consumers’ opinions commented in the reviews, ... reviews into a hierarchy, and leverage the hierarchy to support the tasks of product aspect ranking and opinion-QA on products We outline the key ideas of these strategies in this Section and further detail them in Chapters 3, 4, and 5 7 respectively In particular, we propose a new framework for hierarchical organization of consumer reviews In the framework, we develop a domain-assisted approach to generate . HIERARCHICAL ORGANIZATION OF CONSUMER REVIEWS FOR PRODUCTS AND ITS APPLICATIONS YU JIANXING A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY SCHOOL OF COMPUTING NATIONAL UNIVERSITY OF. “call quality” of the product iPhone 3GS. Besides retail websites, many forum websites also provide a platform for consumers to post reviews on millions of products. For example, the forum CNet.com. the products. Opinionated questions often ask for consumers’ thinking and feeling on the products or aspects of products, such as “What’s everyone’s opinions on iPhone 4?” and the answer is formed

Hierarchical organization of consumer reviews for products and its applications

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan