IT training LNAI 7867 trends and applications in knowledge discovery and data mining li, cao, wang, tan, liu, pei tseng 2013 09 05

LNAI 7867 Jiuyong Li Longbing Cao Can Wang Kay Chen Tan Bo Liu Jian Pei Vincent S Tseng (Eds.) Trends and Applications in Knowledge Discovery and Data Mining PAKDD 2013 International Workshops: DMApps, DANTH, QIMIE, BDM, CDA, CloudSD Gold Coast, QLD, Australia, April 2013 Revised Selected Papers 123 Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science LNAI Series Editors Randy Goebel University of Alberta, Edmonton, Canada Yuzuru Tanaka Hokkaido University, Sapporo, Japan Wolfgang Wahlster DFKI and Saarland University, Saarbrücken, Germany LNAI Founding Series Editor Joerg Siekmann DFKI and Saarland University, Saarbrücken, Germany 7867 Jiuyong Li Longbing Cao Can Wang Kay Chen Tan Bo Liu Jian Pei Vincent S Tseng (Eds.) Trends and Applications in Knowledge Discovery and Data Mining PAKDD 2013 International Workshops: DMApps, DANTH, QIMIE, BDM, CDA, CloudSD Gold Coast, QLD, Australia, April 14-17, 2013 Revised Selected Papers 13 Volume Editors Jiuyong Li University of South Australia, Adelaide, SA, Australia E-mail: jiuyong.li@unisa.edu.au Longbing Cao Can Wang University of Technology, Sydney, NSW, Australia E-mail: longbing.cao@uts.edu.au; canwang613@gmail.com Kay Chen Tan National University of Singapore, Singapore E-mail: eletankc@nus.edu.sg Bo Liu Guangdong University of Technology, Guangzhou, China E-mail: csbliu@gmail.com Jian Pei Simon Fraser University, Burnaby, BC, Canada E-mail: jpei@cs.sfu.ca Vincent S Tseng National Cheng Kung University, Tainan, Taiwan E-mail: tsengsm@mail.ncku.edu.tw ISSN 0302-9743 e-ISSN 1611-3349 ISBN 978-3-642-40318-7 e-ISBN 978-3-642-40319-4 DOI 10.1007/978-3-642-40319-4 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2013944975 CR Subject Classification (1998): H.2.8, I.2, H.3, H.5, H.4, I.5 LNCS Sublibrary: SL – Artificial Intelligence © Springer-Verlag Berlin Heidelberg 2013 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer Permissions for use may be obtained through RightsLink at the Copyright Clearance Center Violations are liable to prosecution under the respective Copyright Law The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made The publisher makes no warranty, express or implied, with respect to the material contained herein Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Preface This volume contains papers presented at PAKDD Workshops 2013, affiliated with the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) held on April 14, 2013 on the Gold Coast, Australia PAKDD has established itself as the premier event for data mining researchers in the PacificAsia region The workshops affiliated with PAKDD 2013 were: Data Mining Applications in Industry and Government (DMApps), Data Analytics for Targeted Healthcare (DANTH), Quality Issues, Measures of Interestingness and Evaluation of Data Mining Models (QIMIE), Biologically Inspired Techniques for Data Mining (BDM), Constraint Discovery and Application (CDA), Cloud Service Discovery (CloudSD), and Behavior Informatics (BI) This volume collects the revised papers from the first six workshops The papers of BI will appear in a separate volume The first six workshops received 92 submissions All papers were reviewed by at least two reviewers In all, 47 papers were accepted for presentation, and their revised versions are collected in this volume These papers mainly cover the applications of data mining in industry, government, and health care The papers also cover some fundamental issues in data mining such as interestingness measures and result evaluation, biologically inspired design, constraint and cloud service discovery These workshops featured five invited speeches by distinguished researchers: Geoffrey I Webb (Monash University, Australia), Osmar R Zaăane (University of Albert, Canada), Jian Pei (Simon Fraser University, Canada), Ning Zhong (Maebashi Institute of Technology, Japan), and Longbing Cao (University of Technology Sydney, Australia) Their talks cover current challenging issues and advanced applications in data mining The workshops would not be successful without the support of the authors, reviewers, and organizers We thank the many authors for submitting their research papers to the PAKDD workshops We thank the successful authors whose papers are published in this volume for their collaboration in the paper revision and final submission We appreciate all PC members for their timely reviews working to a tight schedule We also thank members of the Organizing Committees for organizing the paper submission, reviews, discussion, feedback and the final submission We appreciate the professional service provided by the Springer LNCS editorial teams, and Mr Zhong She’s assistance in formatting June 2013 Jiuyong Li Longbing Cao Can Wang Kay Chen Tan Bo Liu Organization PAKDD Conference Chairs Hiroshi Motoda Longbing Cao Osaka University, Japan University of Technology, Sydney, Australia Workshop Chairs Jiuyong Li Kay Chen Tan Bo Liu University of South Australia, Australia National University of Singapore, Singapore Guangdong University of Technology, China Workshop Proceedings Chair Can Wang University of Technology, Sydney, Australia Organizing Chair Xinhua Zhu University of Technology, Sydney, Australia DMApps Chairs Warwick Graco Yanchang Zhao Inna Kolyshkina Clifton Phua Australian Taxation Office, Australia Department of Immigration and Citizenship, Australia Institute of Analytics Professionals of Australia SAS Institute Pte Ltd, Singapore DANTH Chairs Yanchun Zhang Michael Ng Xiaohui Tao Guandong Xu Yidong Li Hongmin Cai Prasanna Desikan Harleen Kaur Victoria University, Australia Hong Kong Baptist University, Hong Kong University of Southern Queensland, Australia University of Technology, Sydney, Australia Beijing Jiaotong University, China South China University of Technology, China Allina Health, USA United Nations University, International Institute for Global Health, Malaysia VIII Organization QIMIE Chairs Stéphane Lallich Philippe Lenca ERIC, Université Lyon 2, France Lab-STICC, Telecom Bretagne, France BDM Chairs Mengjie Zhang Shafiq Alam Burki Gillian Dobbie Victoria University of Wellington, New Zealand University of Auckland, New Zealand University of Auckland, New Zealand CDA Chairs Chengfei Liu Jixue Liu Swinburne University of Technology, Australia University of South Australia, Australia CloudSD Chairs Michael R Lyu Jian Yang Jian Wu Zibin Zheng The Chinese University of Hong Kong, China Macquarie University, Australia Zhejiang University, China The Chinese University of Hong Kong, China Combined Program Committee Aiello Marco Al´ıpio Jorge Amadeo Napoli Arturas Mazeika Asifullah Khan Bagheri Ebrahim Blanca Vargas-Govea Bo Yang Bouguettaya Athman Bruno Crémilleux Chaoyi Pang David Taniar Dianhui Wang Emilio Corchado Eng-Yeow Cheu University of Groningen, The Netherlands University of Porto, Portugal Lorraine Research Laboratory in Computer Science and Its Applications, France Max Planck Institute for Informatics, Germany PIEAS, Pakistan Ryerson University, Canada Monterrey Institute of Technology and Higher Education, Mexico University of Electronic Science and Technology of China RMIT, Australia Université de Caen, France CSIRO, Australia Monash University, Australia La Trobe University, Australia University of Burgos, Spain Institute for Infocomm Research, Singapore Organization Evan Stubbs Fabien Rico Fabrice Guillet Fatos Xhafa Fedja Hadzic Feiyue Ye Ganesh Kumar Venayagamoorthy Gang Li Gary Weiss Graham Williams Guangfei Yang Guoyin Wang Hai Jin Hangwei Qian Hidenao Abe Hong Cheu Liu Ismail Khalil Johannes Izabela Szczech Jan Rauch Jérˆ ome Azé Jean Diatta Jean-Charles Lamirel Jeff Tian Jeffrey Soar Jerzy Stefanowski Ji Wang Ji Zhang Jianwen Su Jianxin Li Jie Wan Jierui Xie Jogesh K Muppala Joo-Chuan Tong José L Balcázar Julia Belford Jun Ma Junhu Wang Kamran Shafi IX SAS, Australia Université Lyon 2, France Université de Nantes, France Universitat Politècnica de Catalunya, Barcelona, Spain Curtin University, Australia Jiangsu Teachers University of Technology, China Missouri University of Science and Technology, USA Deakin University, Australia Fordham University, USA ATO, Australia Dalian University of Technology, China Chongqing University of Posts and Telecommunications, China Huazhong University of Science and Technology, China VMware Inc., USA Shimane University, Japan University of South Australia, Australia Kepler University, Austria Poznan University of Technology, Poland University of Economics, Prague, Czech Republic Université Paris-Sud, France Université de la Réunion, France LORIA, France Southern Methodist University, USA University of Southern Queensland, Australia Poznan University of Technology, Poland National University of Defense Technology, China University of Southern Queensland, Australia UC Santa Barbara, USA Swinburne University of Technology, Australia University College Dublin, Ireland Oracle, USA University of Science and Technology of Hong Kong, Hong Kong SAP Research, Singapore Universitat Politècnica de Catalunya, Spain University of California, Berkeley, USA University of Wollongong, Australia Griffith University, Australia University of New South Wales, Australia X Organization Kazuyuki Imamura Khalid Saeed Kitsana Waiyamai Kok-Leong Ong Komate Amphawan Kouroush Neshatian Kyong-Jin Shim Liang Chen Lifang Gu Lin Liu Ling Chen Xumin Liu Luis Cavique Martin Holeˇ na Md Sumon Shahriar Michael Hahsler Michael Sheng Mingjian Tang Mirek Malek Mirian Halfeld Ferrari Alves Mohamed Gaber Mohd Saberi Mohamad Mohyuddin Mohyuddin Motahari-Nezhad Hamid Reza Neil Yen Patricia Riddle Paul Kwan Peter Christen Peter Dolog Peter O’Hanlon Philippe Lenca Qi Yu Radina Nikolic Redda Alhaj Ricard Gavald` a Richi Nayek Ritu Chauhan Ritu Khare Robert Hilderman Maebashi Institute of Technology, Japan AGH Krakow, Poland Kasetsart University, Thailand Deakin University, Australia Burapha University, Thailand University of Canterbury, Christchurch, New Zealand Singapore Management University Zhejiang University, China Australian Taxation Office, Australia University of South Australia, Australia University of Technology, Sydney, Australia Rochester Institute of Technology, USA Universidade Aberta, Portugal Academy of Sciences of the Czech Republic CSIRO ICT Centre, Australia Southern Methodist University, USA The University of Adelaide, Australia Department of Human Services, Australia University of Lugano, Switzerland University of Orleans, France University of Portsmouth, UK Universiti Teknologi Malaysia, Malaysia King Abdullah International Medical Research Center, Saudi Arabia HP, USA The University of Aizu, Japan University of Auckland, New Zealand University of New England, Australia Australian National University, Australia Aalborg University, Denmark Experian, Australia Telecom Bretagne, France Rochester Institute of Technology, USA British Columbia Institute of Technology, Canada University of Calgary, Canada Universitat Politècnica de Catalunya, Spain Queensland University of Technology, Australia Amity Institute of Biotechnology, India National Institutes of Health, USA University of Regina, Canada Organization Robert Stahlbock Rohan Baxter Ross Gayler Rui Zhou Sami Bhiri Sanjay Chawla Shangguang Wang Shanmugasundaram Hariharan Shusaku Tsumoto Sorin Moga Stéphane Lallich Stephen Chen Sy-Yen Kuo Tadashi Dohi Thanh-Nghi Do Ting Yu Tom Osborn Vladimir Estivill-Castro Wei Luo Weifeng Su Xiaobo Zhou Xiaoyin Xu Xin Wang Xue Li Yan Li Yanchang Zhao Yanjun Yan Yin Shan Yue Xu Yun Sing Koh Zbigniew Ras Zhenglu Yang Zhiang Wu Zhiquan George Zhou Zhiyong Lu Zongda Wu XI University of Hamburg, Germany Australian Taxation Office, Australia La Trobe University, Australia Swinburne University of Technology, Australia National University of Ireland, Ireland University of Sydney, Australia Beijing University of Posts and Telecommunications, China Abdur Rahman University, India Shimane University, Japan Telecom Bretagne, France Université Lyon 2, France York University, Canada National Taiwan University, Taiwan Hiroshima University, Japan Can Tho University, Vietnam University of Sydney, Australia Brandscreen, Australia Griffith University, Australia The University of Queensland, Australia United International College, Hong Kong The Methodist Hospital, USA Brigham and Women’s Hospital, USA University of Calgary, Canada University of Queensland, Australia University of Southern Queensland, Australia Department of Immigration and Citizenship, Australia ARCON Corporation, USA Department of Human Services, Australian Queensland University of Technology, Australia University of Auckland, New Zealand University of North Carolina at Charlotte, USA University of Tokyo, Japan Nanjing University of Finance and Economics, China University of Wollongong, Australia National Institutes of Health, USA Wenzhou University, China Querying Compressed XML Data 497 If the file is very large, the time required for decompression operation will be very negligible compared to the time required for the decompression of the total file Similarly the overall complexity of the system is equal to the sum of the complexity of the search and decompression algorithms Conclusion The compression of XML data remains an inevitable solution to solve problems related to the coexistence of large data volumes In this context, mining compressed XML documents begins to take its place in the data mining research community In this work, we have proposed a new querying model which ensures two major processes: the re-indexing and the querying compressed XML data This constitutes a combination of an adapted XML indexing plan such as Dietz numbering plan with an XML documents compressor such as XMill to facilitate querying compressed XML data So, compressed data are re-indexed based on an adapted Dietz numbering plan to be suitable to our case The querying process is also developed through the application of the B+Tree algorithm following the re-indexing process Hence, the work is done during the separation of the structure from the content in the compression process As future work, we propose to i) improve the compression ratio with improved existing methods and to take into account the flexibility in the querying process References World Wide Web Consortium, eXtensible Markup Language (XML) 1.0, W3C Recommendation (2006), http://www.w3.org/TR/2006/REC-{XML}-20060816 World Wide Web Consortium, XHTML 1.0 The Extensible HyperText Markup Language (2000), http://www.w3.org/TR/xhtml1 Cheney, J.: Tradeoffs in XML Database Compression In: Data Compression Conference, pp 392–401 (2006) Baˇca, R., Kr´ atk´ y, M.: TJDewey – on the efficient path labeling scheme holistic approach In: Chen, L., Liu, C., Liu, Q., Deng, K (eds.) DASFAA 2009 LNCS, vol 5667, pp 6–20 Springer, Heidelberg (2009) Girardot, M.: Sundaresan N.: Millau: An encoding format for efficient representation and exchange of XML over the Web Computer Networks 33(1-6), 747–765 (2000) League, C., Eng, K.: Schema Based Compression of XML data with Relax NG Journal of Computers 2, 1–7 (2007) Liefke, H., Suciu, D.: XMill: An efficient compressor for XML data In: ACM SIGMOD International Conference on Management of Data, pp 153–164 (2000) Cheney, J.: Compressing XML with Multiplexed Hierarchical PPM Models In: Data Compression Conference, pp 163–172 (2001) Liefke, H., Suciu, D.: An extensible compressor for XML Data SIGMOD Record 29(1), 57–62 (2000) 498 O Arfaoui and M Sassi-Hidri 10 Tagarelli, A.: XML Data Mining: Models, Methods, and Applications University of Calabria, Italy (2011) 11 Chamberlin, D.: XQuery: An XML Query Language IBM Systems Journal 41(4) (2002) 12 Wluk, R., Leong, H., Dillon, T.S., Shan, A.T., Croft, W.B., Allan, J.: A survey in indexing and searching XML documents Journal of the American Society for Information Science and Technology 53(3), 415–435 (2002) 13 Bayer, R., McCreight, E.M.: Binary B-trees for virtual memory In: ACM SIGFIDET Workshop, pp 219–235 (1971) 14 Nelson, M., Gaily, J.L.: The data compression Book 2nd Edition M&T Books (1996) 15 Gailly, J.-L.: Gzip, version 1.2.4, http://www.gzip.org 16 Seward, J.: bzip2, version 0.9.5d, http://sources.redhat.com/bzip2 17 Subramanian, H., Shankar, P.: Compressing XML Documents Using Recursive Finite State Automata In: Farré, J., Litovsky, I., Schmitz, S (eds.) CIAA 2005 LNCS, vol 3845, pp 282–293 Springer, Heidelberg (2006) 18 Adiego, J., De la Fuente, P., Navarro, G.: Merging prediction by partial matching with structural contexts model In: IEEE Data Compression Conference, p 522 (2004) 19 Tolani, P.M., Haritsa, J.R.: XGRIND: A query-friendly XML compressor In: 18th International Conference on Data Engineering, pp 225–234 (2002) 20 Jedidi, A., Arfaoui, O., Sassi-Hidri, M.: Indexing Compressed XML Documents, Web-Age Information Management: XMLDM 2012, Harbin, China, pp 319–328 (2012) 21 Dietz, P., Sleator, D.: Two Algorithms for Maintaining Order in a List In: 19th Annual ACM Symposium on Theory of Computing, pp 365–372 ACM Press (1987) Mining Approximate Keys Based on Reasoning from XML Data Liu Yijun, Ye Feiyue, and He Sheng School of Computer Engineering, Jiangsu University of Technology, Changzhou, Jiangsu, 213001, China Key Laboratory of Cloud Computing & Intelligent Information Processing of Changzhou City, Changzhou, Jiangsu, 213001, China {lyj,yfy,hs}@jsut.edu.cn Abstract Keys are very important for data management Due to the hierarchical and flexible structure of XML, mining keys from XML data is a more complex and difficult task than from relational databases In this paper, we study mining approximate keys from XML data, and define the support and confidence of a key expression based on the number of null values on key paths In the mining process, inference rules are used to derive new keys Through the two-phase reasoning, a target set of approximate keys and its reduced set are obtained Our research conducted experiments over ten benchmark XML datasets from XMark and four files in the UW XML Repository The results show that the approach is feasible and efficient, with which effective keys in various XML data can be discovered Keywords: XML, keys, data mining, support and confidence, key implication Introduction XML is a generic form of semi-structured documents and data on the World Wide Web, and XML databases usually store semi-structured data integrated from various types of data sources The problem that how to efficiently manage and query XML data has attracted lots of research interests Much work has been done in applying traditional integrity constraints in relational databases to XML databases, such as keys, foreign keys, functional dependency and multi-valued dependency, etc.[1,2,3,4,5,6] As the unique identifiers of a record, keys are significantly important for database design and data management[7] Various forms of key constraints for XML data are to be found in [6,8,9,10,11] In this paper we use the key definition proposed by Buneman et al in [12,13] They propose not only the concepts of absolute keys and relative keys independent of schema, which are in keeping with the hierarchically structured nature of XML, but also a sound and complete axiomatization for key implication By using the inference rules, the keys can be reasoned about efficiently Though key definitions and their implication are suggested, there are still some issues needed to be considered in the practical mining of XML keys, as pointed out in J Li et al (Eds.): PAKDD 2013 Workshops, LNAI 7867, pp 499–510, 2013 © Springer-Verlag Berlin Heidelberg 2013 500 L Yijun, Y Feiyue, and H Sheng [14] Firstly, there could be no clear keys in XML data which is semi-structured and usually integrated from multiple heterogeneous data sources Secondly, an XML database may have a large number of keys and therefore we should consider how to store them appropriately Thirdly, the most important problem is how to find out the keys holding in a given XML dataset in an efficient way Currently there is not much work in the literature in practical mining of keys from XML data Găosta Grahne et al in [14] define the support and confidence of a key expression and a partial order on the set of all keys, and finally a reduced set of approximate keys are obtained In this paper, we also study the issue of mining keys from XML data Considering the characteristics of XML data, we propose another universal approach for mining keys Key Definitions and Related Concepts The discussions in this section are mainly based on the definitions by Buneman et al.[12,13] 2.1 The Tree Model for XML An XML document is typically modeled as a labeled tree A node of the tree represents an element, attribute or text(value), and edges represent the nested relationships between nodes Node labels are divided into three pairwise disjoint sets: E the finite set of element tags, A the finite set of attribute names, and the singleton {S}, where S represents text (PCDATA) An XML tree is formally defined as follows Definition An XML tree is a 6-tuple T=(r, V, lab, ele, att, val), where • r is the unique root node in the tree, i.e the document node, and r ∈ V • V is a finite set containing all nodes in T • lab is a function from V to E A {S} For each v ∈ V, v is an element if lab(v) ∈ E, an attribute if lab(v) ∈ A, and a text node if lab(v)=S • Both ele and att are partial functions from V to V* For each v ∈ V, if lab(v) ∈ E, ele(v) is a sequence of elements and text nodes in V and att(v) is a set of attributes in V; For each v’ ∈ ele(v) or v’ ∈ att(v), v’ is the child of v and there exists an edge from v to v’ • val is a partial function from V to string, mapping each attribute and text node to a string For each v ∈ V, if lab(v) ∈ A or lab(v)=S, val(v) is a string of v ∪∪ 2.2 Path Expressions In the XML tree, a node is uniquely identified by a path of node sequence Because the concatenation operation does not have a uniform representation in XPath used in XML-Schema, Buneman et al.[12] have proposed an alternative syntax For identifying nodes in an XML tree, we use their path languages called PLs, PLw and PL, where ε represents the empty path, l is a node label in E A {S}, and “.” is concatenation ∪∪ Mining Approximate Keys Based on Reasoning from XML Data 501 In PLs, a valid path is the empty path or the sequence of labels of nodes PLw allows the symbol “_” which can match any node label PL includes the symbol “_*” which represents any sequence of node labels The notation P ⊆ Q denotes that the language defined by P is a subset of the language defined by Q For the path expression P and the node n, the notation n[P] denotes the set of nodes in T that can be reached by following a path that conforms to P from n The notation [P] is the abbreviation for r[P], where r is the root in T The notation |P| denotes the number of labels in the path |ε| is 0, and “_” and “_*” are both counted as labels with length The paths which are merely sequences of labels are called simple paths 2.3 Definitions on Keys Definition A key constraint φ for XML is an expression (Q’, (Q, {P1,…, Pk})) where Q’, Q and Pi are path expressions Q’ is called the context path, Q is called the target path, and Pi is called the key paths of φ If Q’= ε, φ is called an absolute key, otherwise φ is called a relative key The expression (Q, S) is the abbreviation of (ε, (Q, S)), where S={P1,…, Pk} Definition Let φ=(Q’, (Q, {P1,…, Pk})) be a key expression An XML tree T satisfies φ, denoted as T = φ, if and only if for every n ∈ [Q’], given any two nodes n1, n2 ∈ n[Q], if for all i, ≤ i ≤ k, there exist z1 ∈ n1[Pi] and z2 ∈ n2[Pi] such that z1=v z2, then n1=n2 That is, | ∀n1 , n2 ∈ n[Q ]      ∧ ∃z1 ∈ n1[ Pi ]∃z ∈ n2 [ Pi ]( z1 = v z ) → n1 = n2     1≤i ≤ k  The definition of keys is quite weak The key expression could hold even though key paths are missing at some nodes This definition is consistent with the semistructured nature of XML, but does not mirror the requirements imposed by a key in relational databases, i.e uniqueness of a key and equality of key values The definition which meets both two requirements is proposed in [12] Definition Let φ=(Q’, (Q, {P1,…, Pk})) be a key expression An XML tree T satisfies φ, if and only if for any n ∈ [Q’], (1) For any n’ in n[Q] and for all Pi (1 ≤ i ≤ k), Pi exists and is unique at n’ (2) For any two nodes n1, n2 ∈ n[Q], if n1[Pi] =v n2[Pi] for all i, ≤ i ≤ k, then n1 = n2 The definition of keys is stronger than the definition 3, and the key paths are required to exist and be unique Note that there probably are empty tags in XML documents A consequence is that some nodes in n’[Pi] are null-valued, which is allowed in the definition However the attributes of the primary key in relational databases are not allowed null Here we explore a strong key definition which captures this requirement 502 L Yijun, Y Feiyue, and H Sheng Definition Let φ=(Q’, (Q, {P1,…, Pk})) be a key expression An XML tree T satisfies φ, if and only if for any n ∈ [Q’], (1) For any n’ in n[Q] and for all Pi (1 ≤ i ≤ k), Pi exists and is unique at n’, and all nodes in n’[Pi] are not null valued (2) For any two nodes n1, n2 ∈ n[Q], if n1[Pi] =v n2[Pi] for all i, ≤ i ≤ k, then n1 = n2 In the definition of strong keys, the key paths are required to exist, be unique and not have a null value In relational databases, a tuple can be identified by more than one group of key attributes Analogously, given a context path Q’ and a target path Q in the XML tree T, there exist probably multiple sets S of key paths such that T|=(Q’, (Q, S)) Definition Let φ=(Q’, (Q, S)) be a key expression satisfied in the XML tree T If for any key expression φ’=(Q’, (Q, S’)) satisfied in T, |S|

IT training LNAI 7867 trends and applications in knowledge discovery and data mining li, cao, wang, tan, liu, pei tseng 2013 09 05

Thông tin tài liệu

Từ khóa liên quan

Mục lục

Preface

Organization

Table of Contents

Data Mining Applicationsin Industry and Government

Using Scan-Statistical Correlations for Network Change Analysis

1 Introduction

2 Related Work

3 Scan-Statistics

4 Scan-Statistical Correlations

5 Multi-level Correlations Analysis

5.1 Aggregation of Correlation Data

5.2 Global Network Graph Correlation (G)

5.3 Vertex Level Correlation (V)

5.4 Vertex-to-Vertex Correlation (V×V)

6 A Multi-level Network Change Analysis Scheme

7 Experiments

8 Conclusions

References

Predicting High Impact Academic Papers Using Citation Network Features

1 Introduction

2 Related Work

3 The Scopus Database

4 Methods

4.1 Measuring Paper Impact

4.2 Predictive Features

Tài liệu cùng người dùng

Tài liệu liên quan