Data mining know it all

496 449 0
Data mining know it all

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Data Mining This page intentionally left blank Data Mining Know It All Soumen Chakrabarti Thomas P Nadeau Earl Cox Richard E Neapolitan Eibe Frank Dorian Pyle Ralf Hartmut Güting Mamdouh Refaat Jaiwei Han Markus Schneider Xia Jiang Toby J Teorey Micheline Kamber Ian H Witten Sam S Lightstone AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann is an imprint of Elsevier Morgan Kaufmann Publishers is an imprint of Elsevier 30 Corporate Drive, Suite 400 Burlington, MA 01803 This book is printed on acid-free paper Copyright © 2009 by Elsevier Inc All rights reserved Designations used by companies to distinguish their products are often claimed as trademarks or registered trademarks In all instances in which Morgan Kaufmann Publishers is aware of a claim, the product names appear in initial capital or all capital letters Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, scanning, or otherwise, without prior written permission of the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail: permissions@elsevier.com You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact” then “Copyright and Permission” and then “Obtaining Permissions.” Library of Congress Cataloging-in-Publication Data Chakrabarti, Soumen   Data mining: know it all / Soumen Chakrabarti et al     p.  cm — (Morgan Kaufmann know it all series)   Includes bibliographical references and index   ISBN 978-0-12-374629-0 (alk paper)   1.  Data mining.  I.  Title QA76.9.D343C446 2008 005.74—dc22 2008040367 For information on all Morgan Kaufmann publications, visit our Website at www.mkp.com or www.books.elsevier.com Printed in the United States 08 09 10 11 12  10 Working together to grow libraries in developing countries www.elsevier.com | www.bookaid.org | www.sabre.org Contents About This Book Contributing Authors CHAPTER 1.1 1.2 1.3 1.4 1.5 1.6 1.7 CHAPTER 2.1 2.2 2.3 2.4 2.5 2.6 2.7 CHAPTER 3.1 3.2 3.3 3.4 3.5 3.6 What’s It All About? Data Mining and Machine Learning Simple Examples: The Weather Problem and Others Fielded Applications Machine Learning and Statistics Generalization as Search Data Mining and Ethics Resources ix xi 1 20 27 28 32 34 Data Acquisition and Integration 37 Introduction Sources of Data Variable Types Data Rollup Rollup with Sums, Averages, and Counts Calculation of the Mode Data Integration 37 37 39 41 48 49 50 Data Preprocessing 57 Why Preprocess the Data? 58 Descriptive Data Summarization 61 Data Cleaning 72 Data Integration and Transformation 78 Data Reduction 84 Data Discretization and Concept Hierarchy Generation 98 vi    Contents 3.7 3.8 CHAPTER 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 CHAPTER 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 CHAPTER 6.1 6.2 6.3 6.4 6.5 6.6 CHAPTER 7.1 7.2 7.3 7.4 Summary 108 Resources 109 Physical Design for Decision Support, Warehousing, and OLAP 113 What Is Online Analytical Processing? Dimension Hierarchies Star and Snowflake Schemas Warehouses and Marts Scaling up the System Dss, Warehousing, and Olap Design Considerations Usage Syntax and Examples for Major Database Servers Summary Literature Summary Resources 113 116 117 119 122 Algorithms: The Basic Methods Inferring Rudimentary Rules Statistical Modeling Divide and Conquer: Constructing Decision Trees Covering Algorithms: Constructing Rules Mining Association Rules Linear Models Instance-based Learning Clustering Resources 131 132 136 144 153 160 168 176 184 188 Further Techniques in Decision Analysis Modeling Risk Preferences Analyzing Risk Directly Dominance Sensitivity Analysis Value of Information Normative Decision Analysis 191 191 198 200 205 215 220 124 125 128 129 129 Fundamental Concepts of Genetic Algorithms 221 The Vocabulary of Genetic Algorithms Overview The Architecture of a Genetic Algorithm Practical Issues in Using a Genetic Algorithm 222 230 241 285 Contents   vii 7.5 7.6 CHAPTER 8.1 8.2 8.3 8.4 CHAPTER 9.1 9.2 9.3 CHAPTER 10 10.1 10.2 10.3 10.4 10.5 10.6 10.7 Review 290 Resources 290 Data Structures and Algorithms for Moving Objects Types 293 Data Structures Algorithms for Operations on Temporal Data Types Algorithms for Lifted Operations Resources 293 298 310 319 Improving the Model 321 Learning from Errors 323 Improving Model Quality, Solving Problems 343 Summary 395 Social Network Analysis Social Sciences and Bibliometry Pagerank and Hyperlink-induced Topic Search Shortcomings of the Coarse-grained Graph Model Enhanced Models and Techniques Evaluation of Topic Distillation Measuring and Modeling the Web Resources 397 398 400 410 416 424 430 440 Index 443 This page intentionally left blank About This Book All of the elements about data mining are here together in a single resource written by the best and brightest experts in the field! This book consolidates both introductory and advanced topics, thereby covering the gamut of data mining and machine learning tactics—from data integration and preprocessing to fundamental algorithms to optimization techniques and web mining methodology Data Mining: Know It All expertly combines the finest data mining material from the Morgan Kaufmann portfolio with individual chapters contributed by a select group of authors They have been combined into one comprehensive book in a way that allows it to be used as a reference work for those interested in new and developing aspects of data mining This book represents a quick and efficient way to unite valuable content from leaders in the data mining field, thereby creating a definitive, one-stop-shopping opportunity to access information you would otherwise need to round up from disparate sources ... gamut of data mining and machine learning tactics—from data integration and preprocessing to fundamental algorithms to optimization techniques and web mining methodology Data Mining: Know It All expertly... without benefiting from it at all Earlier we defined data mining operationally as the process of discovering patterns, automatically or semiautomatically, in large quantities of data and the.. .Data Mining This page intentionally left blank Data Mining Know It All Soumen Chakrabarti Thomas P Nadeau Earl Cox Richard E Neapolitan Eibe Frank Dorian Pyle Ralf

Ngày đăng: 11/04/2017, 10:19

Từ khóa liên quan

Mục lục

  • Front cover

  • Data Mining: Know It All

  • Copyright page

  • Table of contents

  • About This Book

  • Contributing Authors

  • Chapter 1 What’s It All About?

    • 1.1 DATA MINING AND MACHINE LEARNING

    • 1.2 SIMPLE EXAMPLES: THE WEATHER PROBLEM AND OTHERS

    • 1.3 FIELDED APPLICATIONS

    • 1.4 MACHINE LEARNING AND STATISTICs

    • 1.5 GENERALIZATION AS SEARCH

    • 1.6 DATA MINING AND ETHICS

    • 1.7 RESOURCES

    • Chapter 2 Data Acquisition and Integration

      • 2.1 INTRODUCTION

      • 2.2 SOURCES OF DATA

      • 2.3 VARIABLE TYPES

      • 2.4 DATA ROLLUP

      • 2.5 ROLLUP WITH SUMS, AVERAGES, AND COUNTS

      • 2.6 CALCULATION OF THE MODE

      • 2.7 DATA INTEGRATION

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan