Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining pptx

101 4.3K 1
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining pptx

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining Classification: Definition Given a collection of records (training set ) – Each record contains a set of attributes, one of the attributes is the class Find a model for class attribute as a function of the values of other attributes Goal: previously unseen records should be assigned a class as accurately as possible – A test set is used to determine the accuracy of the model Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it © Tan,Steinbach, Kumar Introduction to Data Mining Illustrating Classification Task © Tan,Steinbach, Kumar Introduction to Data Mining Examples of Classification Task Predicting tumor cells as benign or malignant Classifying credit card transactions as legitimate or fraudulent Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil Categorizing news stories as finance, weather, entertainment, sports, etc © Tan,Steinbach, Kumar Introduction to Data Mining Classification Techniques Decision Tree based Methods Rule-based Methods Memory based reasoning Neural Networks Naïve Bayes and Bayesian Belief Networks Support Vector Machines © Tan,Steinbach, Kumar Introduction to Data Mining Example of a Decision Tree l l s ca ca i i ou r r u s go go tin n te te as cl ca ca co Tid Refund Marital Status Yes Single 125K No No Married 100K No No Single 70K No Yes Married 120K No No Divorced 95K Yes No Married No Yes Divorced 220K No No Single 85K Yes No Married 75K No 10 No Single 90K Splitting Attributes Taxable Income Cheat Yes 60K Refund Yes No NO MarSt Single, Divorced TaxInc < 80K NO Married NO > 80K YES 10 Model: Decision Tree Training Data © Tan,Steinbach, Kumar Introduction to Data Mining Another Example of Decision Tree s al al ic ic ou r r u s go go tin n te te as cl ca ca co Tid Refund Marital Status Taxable Income Cheat Yes Single 125K No No Married 100K No No Single 70K No Yes Married 120K No No Divorced 95K Yes No Married No Yes Divorced 220K No Single 85K Yes No Married 75K No 10 No Single 90K Yes Single, Divorced No Married MarSt 60K NO Refund No Yes NO TaxInc < 80K NO > 80K YES There could be more than one tree that fits the same data! 10 © Tan,Steinbach, Kumar Introduction to Data Mining Decision Tree Classification Task Decision Tree © Tan,Steinbach, Kumar Introduction to Data Mining Apply Model to Test Data Test Data Start from the root of tree Refund Marital Status No Refund Yes Taxable Income Cheat 80K Married ? 10 No NO MarSt Single, Divorced TaxInc < 80K NO © Tan,Steinbach, Kumar Married NO > 80K YES Introduction to Data Mining Apply Model to Test Data Test Data Refund Marital Status No Refund Yes Taxable Income Cheat 80K Married ? 10 No NO MarSt Single, Divorced TaxInc < 80K NO © Tan,Steinbach, Kumar Married NO > 80K YES Introduction to Data Mining 10 Model Evaluation Metrics for Performance Evaluation – How to evaluate the performance of a model? Methods for Performance Evaluation – How to obtain reliable estimates? Methods for Model Comparison – How to compare the relative performance among competing models? © Tan,Steinbach, Kumar Introduction to Data Mining 87 ROC (Receiver Operating Characteristic) Developed in 1950s for signal detection theory to analyze noisy signals – Characterize the trade-off between positive hits and false alarms ROC curve plots TP (on the y-axis) against FP (on the xaxis) Performance of each classifier represented as a point on the ROC curve – changing the threshold of algorithm, sample distribution or cost matrix changes the location of the point © Tan,Steinbach, Kumar Introduction to Data Mining 88 ROC Curve - 1-dimensional data set containing classes (positive and negative) - any points located at x > t is classified as positive At threshold t: TP=0.5, FN=0.5, FP=0.12, FN=0.88 © Tan,Steinbach, Kumar Introduction to Data Mining 89 ROC Curve (TP,FP): (0,0): declare everything to be negative class (1,1): declare everything to be positive class (1,0): ideal Diagonal line: – Random guessing – Below diagonal line: • prediction is opposite of the true class © Tan,Steinbach, Kumar Introduction to Data Mining 90 Using ROC for Model Comparison q No model consistently outperform the other q M1 is better for small FPR q M2 is better for large FPR q Area Under the ROC curve q Ideal:  q Random guess:  © Tan,Steinbach, Kumar Introduction to Data Mining Area = Area = 0.5 91 How to Construct an ROC curve Instance P(+|A) True Class 0.95 + 0.93 + 0.87 - 0.85 - 0.85 - 0.85 + 0.76 - 0.53 + 0.43 - 10 0.25 + • Use classifier that produces posterior probability for each test instance P(+|A) • Sort the instances according to P(+|A) in decreasing order • Apply threshold at each unique value of P(+|A) • Count the number of TP, FP, TN, FN at each threshold • TP rate, TPR = TP/(TP+FN) • FP rate, FPR = FP/(FP + TN) © Tan,Steinbach, Kumar Introduction to Data Mining 92 How to construct an ROC curve + - + - - - + - + + 0.25 0.43 0.53 0.76 0.85 0.85 0.85 0.87 0.93 0.95 1.00 4 3 3 2 FP 5 4 1 0 TN 0 1 4 5 FN 1 2 2 3 TPR 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.2 FPR 1 0.8 0.8 0.6 0.4 0.2 0.2 0 Class P Threshold >= TP ROC Curve: © Tan,Steinbach, Kumar Introduction to Data Mining 93 Test of Significance Given two models: – Model M1: accuracy = 85%, tested on 30 instances – Model M2: accuracy = 75%, tested on 5000 instances Can we say M1 is better than M2? – How much confidence can we place on accuracy of M1 and M2? – Can the difference in performance measure be explained as a result of random fluctuations in the test set? © Tan,Steinbach, Kumar Introduction to Data Mining 94 Confidence Interval for Accuracy Prediction can be regarded as a Bernoulli trial – A Bernoulli trial has possible outcomes – Possible outcomes for prediction: correct or wrong – Collection of Bernoulli trials has a Binomial distribution: • x ∼ Bin(N, p) x: number of correct predictions • e.g: Toss a fair coin 50 times, how many heads would turn up? Expected number of heads = N×p = 50 × 0.5 = 25 Given x (# of correct predictions) or equivalently, acc=x/N, and N (# of test instances), Can we predict p (true accuracy of model)? © Tan,Steinbach, Kumar Introduction to Data Mining 95 Confidence Interval for Accuracy Area = - α For large test sets (N > 30), – acc has a normal distribution with mean p and variance p(1-p)/N P(Z < α /2 acc − p d t = 0.100 ± 1.96 × 0.0043 = 0.100 ± 0.128 Interval contains => difference may not be statistically significant © Tan,Steinbach, Kumar Introduction to Data Mining 100 Comparing Performance of Algorithms Each learning algorithm may produce k models: – L1 may produce M11 , M12, …, M1k – L2 may produce M21 , M22, …, M2k If models are generated on the same test sets D1,D2, …, Dk (e.g., via cross-validation) – For each set: compute dj = e1j – e2j – dj has mean dt and variance σt – Estimate: k ∑ (d j − d ) σ = ˆ j =1 k (k − 1) d = d ±t σ ˆ t t © Tan,Steinbach, Kumar 1−α ,k −1 Introduction to Data Mining t 101 ... same data! 10 © Tan,Steinbach, Kumar Introduction to Data Mining Decision Tree Classification Task Decision Tree © Tan,Steinbach, Kumar Introduction to Data Mining Apply Model to Test Data Test Data. .. P(C2) = 4/ 6 Error = – max (2/6, 4/ 6) = – 4/ 6 = 1/3 Introduction to Data Mining 43 Comparison among Splitting Criteria For a 2-class problem: © Tan,Steinbach, Kumar Introduction to Data Mining 44 Misclassification... > Yes 3 3 2 3 3 No 4 4 Gini © Tan,Steinbach, Kumar 0 .42 0 0 .40 0 0.375 0. 343 0 .41 7 Introduction to Data Mining 0 .40 0 0.300 0. 343 0.375 0 .40 0 0 .42 0 37 Alternative Splitting Criteria

Ngày đăng: 15/03/2014, 09:20

Từ khóa liên quan

Mục lục

  • Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

  • Classification: Definition

  • Illustrating Classification Task

  • Examples of Classification Task

  • Classification Techniques

  • Example of a Decision Tree

  • Another Example of Decision Tree

  • Decision Tree Classification Task

  • Apply Model to Test Data

  • Slide 10

  • Slide 11

  • Slide 12

  • Slide 13

  • Slide 14

  • Slide 15

  • Decision Tree Induction

  • General Structure of Hunt’s Algorithm

  • Hunt’s Algorithm

  • Tree Induction

  • Slide 20

  • How to Specify Test Condition?

  • Splitting Based on Nominal Attributes

  • Splitting Based on Ordinal Attributes

  • Splitting Based on Continuous Attributes

  • Slide 25

  • Slide 26

  • How to determine the Best Split

  • Slide 28

  • Measures of Node Impurity

  • How to Find the Best Split

  • Measure of Impurity: GINI

  • Examples for computing GINI

  • Splitting Based on GINI

  • Binary Attributes: Computing GINI Index

  • Categorical Attributes: Computing Gini Index

  • Continuous Attributes: Computing Gini Index

  • Continuous Attributes: Computing Gini Index...

  • Alternative Splitting Criteria based on INFO

  • Examples for computing Entropy

  • Splitting Based on INFO...

  • Slide 41

  • Splitting Criteria based on Classification Error

  • Examples for Computing Error

  • Comparison among Splitting Criteria

  • Misclassification Error vs Gini

  • Slide 46

  • Stopping Criteria for Tree Induction

  • Decision Tree Based Classification

  • Example: C4.5

  • Practical Issues of Classification

  • Underfitting and Overfitting (Example)

  • Underfitting and Overfitting

  • Overfitting due to Noise

  • Overfitting due to Insufficient Examples

  • Notes on Overfitting

  • Estimating Generalization Errors

  • Occam’s Razor

  • Minimum Description Length (MDL)

  • How to Address Overfitting

  • How to Address Overfitting…

  • Example of Post-Pruning

  • Examples of Post-pruning

  • Handling Missing Attribute Values

  • Computing Impurity Measure

  • Distribute Instances

  • Classify Instances

  • Other Issues

  • Data Fragmentation

  • Search Strategy

  • Expressiveness

  • Decision Boundary

  • Oblique Decision Trees

  • Tree Replication

  • Model Evaluation

  • Slide 75

  • Metrics for Performance Evaluation

  • Metrics for Performance Evaluation…

  • Limitation of Accuracy

  • Cost Matrix

  • Computing Cost of Classification

  • Cost vs Accuracy

  • Cost-Sensitive Measures

  • Slide 83

  • Methods for Performance Evaluation

  • Learning Curve

  • Methods of Estimation

  • Slide 87

  • ROC (Receiver Operating Characteristic)

  • ROC Curve

  • Slide 90

  • Using ROC for Model Comparison

  • How to Construct an ROC curve

  • How to construct an ROC curve

  • Test of Significance

  • Confidence Interval for Accuracy

  • Slide 96

  • Slide 97

  • Comparing Performance of 2 Models

  • Slide 99

  • An Illustrative Example

  • Comparing Performance of 2 Algorithms

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan