Data Mining and Knowledge Discovery Handbook, 2 Edition part 106 ppt

1030 Steve Moyle setting. Such well defined and strong processes include, for instance, clear model evaluation procedures (Blockeel and Moyle, 2002). Different perspectives exist on what collaborative Data Mining is (this is discussed further in section 54.5). Three interpretations are: 1) multiple software agents applying Data Mining algorithms to solve the same problem; 2) humans using modern collaboration techniques to apply Data Mining to a single, defined problem; 3) Data Mining the artifacts of human collaboration. This chapter will focus solely on the second item – that of humans using collaboration techniques to apply data mining to a single task. With sufficient definition of a particular Data Mining problem, this is similar to a multiple software agent Data Mining framework (the first item), although this is not the aim of the chapter. Many of the difficulties encountered in human collaboration will also be encountered in designing a system for software agent collaboration. Collaborative Data Mining aims to combine the results generated by isolated experts, by enabling the collaboration of geographically dispersed laboratories and companies. For each Data Mining problem, a virtual team of experts is selected on the basis of adequacy and availability. Experts apply their methods to solving the problem – but also communicate with each other to share their growing understanding of the problem. It is here that collaboration is key. The process of analyzing data through models has many similarities to experimental research. Like the process of scientific discovery, Data Mining can benefit from different techniques used by multiple researchers who collaborate, compete, and compare results to improve their combined understanding. The rest of this chapter is organized as follows. The potential difficulties in (remote) collaboration and a framework for analyzing such difficulties are outlined. A standard Data Mining process is reviewed, and studied for the likely contributions that can be achieved collaboratively. A collaboration process for Data Mining is presented, with clear guidelines for the practitioner so that they may avoid the potential pitfalls related to collaborative Data Mining. A brief summary of real examples of the application of collaborative Data Mining are presented. The chapter concludes with a discussion. 54.2 Remote Collaboration This section considers the motivations behind (remote) collaboration 1 , and types of collaboration it enables. It then reviews the framework proposed by McKenzie and Van Winke- len (McKenzie and van Winkelen, 2001) for working within e-Collaboration Space. The term e-Collaboration will be used as shorthand for remote collaboration, but many of the principles can be applied to local collaboration also. 54.2.1 E-Collaboration:Motivations and Forms The main motivation for collaboration (Moyle et al., 2003) is to harness dispersed expertise and to enable knowledge sharing and learning in a manner that builds intellectual capital (Edvinsson and Malone, 1997). This offers tantalizing potential rewards including boost- ing innovation, flexible resource management, and reduced risk (Amara, 1990, Mowshowitz, 1997, Nohria and Eccles, 1993, Snow et al., 1996), but these rewards are offset by numerous difficulties mainly due to the increased complexity of a virtual environment. In (McKenzie and van Winkelen, 2001) seven distinct forms of e-collaborating organizations that can be distinguished either by their structure or the intent behind their formation are 1 The term “remote” is removed in the sequel. 54 Collaborative Data Mining 1031 identified. These are: 1) virtual/smart organizations; 2) a community of interest and practice; 3) a virtual enterprise; 4) virtual teams; 5) a community of creation; 6) collaborative product commerce or customer communities; and 7) virtual sourcing and resource coalitions. For collaborative data mining forms 4, and 5 are most relevant. These forms are summarized below. • Virtual Teams are temporary culturally diverse geographically dispersed work groups that communicate electronically. These can be smaller entities within virtual enterprises, or within a transnational organization. They can be categorized by changing membership and multiple organizational contexts. • A Community of creation is revolves around a central firm and shares its knowledge for the purpose of innovation. This structure consists of individuals and organizations with ever changing boundaries. Having recognized the collaboration form makes it possible to analyze the difficulties that might be encountered. Such an analysis can be performed with respect to the e-collaboration space model described in the next section. 54.2.2 E-Collaboration Space Each type of e-collaboration form can be usefully analyzed with respect to McKenzie and Van Winkelen’s e-Collaboration Space model (McKenzie and van Winkelen, 2001). This model casts each form into the space by studying their location on the three dimensions of: number of boundaries crossed, task, and relationships. • Boundaries crossed: The more boundaries that are crossed in e-collabo ration, the more barriers to a successful outcome are present. All communication takes place across some boundary (Wilson, 2002). Fewer boundaries between agents lead to a lower risk of misunderstanding. In e-collaboration the number of boundaries is automatically increased. Influential boundaries to successful e-collaboration are: technological, temporal, organizational, and cultural. • Task: The nature of the tasks involved in the collaborative project is influenced by the complexity of the processes, uncertainty of the available information and outcomes, and interdependence of the various stages of the task. The complexity can be broadly classified into linear – step-by-step processes; or non-linear. The interdependence of a task relates to whether it can be decomposed into subtasks which can be worked on independently by different participants. • Relationships: Relationships are key to any successful collaboration. When electronic communication is the only mode of interaction it is harder for relationships to form, because the instinctive reading of signals that establish trust and mutual understanding are less accessible to participants. For the remainder of the chapter only the dimension of task will be highlighted within the e-collaboration space model. As will be described in the next sub-section, task complexity makes collaborative Data Mining risk prone. 54.2.3 Collaborative Data Mining in E-Collaboration Space Different forms of e-collaboration – as measured relative to the dimensions of task, boundaries, and relationships – can be viewed as locations in a three dimensional e-collaboration 1032 Steve Moyle space. The location of a collaborative Data Mining project depends on the actual setting of such a project. The most well defined dimension with respect to the Data Mining process (refer back to section 60.2.1) is that of task. The task complexity of Data Mining is high. Not only is there a high level of expertise involved in a Data Mining project, but also there is the risk that in reaching the final solution(s), much effort will appear – in hindsight – to have been wasted. Data miners have long under- stood the need for a methodology to support the Data Mining process (Adriaans and Zantinge, 1996,Fayyad et al., 1996, Chapman et al., 2000). All these methodologies are explicit that the Data Mining process is non-linear, and warns that information uncovered in later phases can invalidate assumptions made in earlier phases. As a result the previous phases may need to be re-visited. To exacerbate the situation, Data Mining is by its very nature a speculative process – there may be no valuable information contained in the data sources at all, or the techniques being used may not have sufficient power to uncover it. A typical Data Mining project at the start of the collaboration is summarized with respect to the e-collaboration model in Table 54.1. Table 54.1. The position of a disperse collaborative Data Mining project in E-collaboration space ( † potential boundary depending on situation). Task Boundaries Crossed Relationships High High Medium High - Complex non-linear - Medium technological - Medium commonality of view interdependencies - temporal † - Medium duration of existing - Uncertainty - geographical relationship - large organizational † - Medium duration of - cultural † collaboration 54.3 The Data Mining Process Data Mining processes broadly consist of a number of phases. These phases, however, are interrelated and are not necessarily executed in a linear manner. For example, the results of one phase may uncover more detail relating to an earlier phase and may force more effort to be expended on a phase previously thought complete. The CRISP-DM methodology — CRoss Industry Standard Process for Data Mining (Chapman et al., 2000), is an attempt to standardise the process of Data Mining. In CRISP-DM, six interrelated phases are used to describe the Data Mining process: business understanding, data understanding, data preparation, modelling, evaluation, and deployment (Figure 54.1). The main outputs of the business understanding phase are the definition of business and data mining objectives as well as business and Data Mining evaluation criteria. In this phase an assessment of resource requirements and estimation of risk is performed. In the data understanding phase data collected and char- acterized. Data quality is also assessed. During data preparation, tables, records and attributes are selected and transformed for modelling. Modelling is the process of extracting input/output patterns from given data and deriving models — typically mathematical or logical models. In the modelling phase, various techniques (e.g. association rules, decision trees, logistic regression, k-means clustering) 54 Collaborative Data Mining 1033 Fig. 54.1. The CRISP-DM cycle are selected and applied and their parameters are calibrated – or tuned – to optimal values. Different models are compared, and possibly combined. In the evaluation phase models are selected and reviewed according to the business criteria. The whole Data Mining process is reviewed and a list of possible actions is elaborated. In the last phase, deployment is planned, implemented, and monitored. The entire project is typically documented and summarized in a report. The CRISP-DM handbook (Chapman et al., 2000) describes in detail how each of the main phases is subdivided into specific tasks, with clearly defined predecessors/successors, and inputs/outputs. 54.4 Collaborative Data Mining Guidelines The CRISP-DM Data Mining process described in the preceding section can be adopted by Data Mining agents collaborating remotely on a particular Data Mining project (SolEuNet, 2002, Flach et al., 2003). Not all of the CRISP-DM methodology can be entirely performed in a collaboartive setting. Business understanding for instance, requires intense close contact with the business environment for which the Data Mining is being performed. The phases that can most easily be performed in a remote-collaborative fashion are data preparation and modelling. The other phases can nevertheless benefit from a collaborative approach. Although many of the specific tasks can be carried out independently, care must be taken by the participants to ensure that efforts are not wasted. Principles to guide the process of collaboration should be established in advance of a collaborative Data Mining project. For instance, individual agents must communicate or share any intermediate results – or improvements in the current best understanding of the Data Mining problem – so that all agents have the new knowledge. Providing a catalogue of up-to-date knowledge about the problem assists new agents entering the Data Mining project. Furthermore, procedures are required for how results from different agents are compared, and ultimately combined, so that the value of efforts is greater than the sum of the individual components. 54.4.1 Collaboration Principles (Moyle et al., 2003) present a framework for collaborative Data Mining, involving both principles and technological support. Collaborative groupware technology, with specific function- 1034 Steve Moyle ality to support data mining are described (Vo et al., 2001). Principles for collaborative data mining are outlined as follows (Moyle et al., 2003). 1. Requisite management. Sufficient management processes should be established. In particular the definition and objectives of the Data Mining problem should be clear from the start of the project to all participants. An infrastructure ensuring information flows within the network of agents should be provided. 2. Problem Solving Freedom. Agents should use their expertise and tools to execute Data Mining tasks to solve problem in the manner they find best. 3. Start any time. All the necessary information about the Data Mining problem should be captured and made available to participants at all times. This includes problem definition, data, evaluation criteria, and any knowledge produced. 4. Stop any time. Participants should work on their solutions so that a working solution – however crude – is available whenever a stop signal is issued. These solutions will typically be Data Mining models. One approach is to try simpler modeling techniques first (Holte, 1993). 5. Online knowledge sharing. The knowledge about the Data Mining problem gained by each participant at each phase should be shared with all participants in a timely manner. 6. Security. Data and information about the Data Mining problem may contain sensitive information and must not to be revealed outside the project. Access to information must be controlled. Having established a collaborative Data Mining project with appropriate principles and support, how can the results of the Data Mining efforts be compared and combined so that the results are maximized? This is the question that the next section deals with. 54.4.2 Data Mining model evaluation and combination One of the main outputs from the Data Mining process (Chapman et al., 2000) are the Data Mining models. These may take many forms including decision trees, rules, artificial neural- networks, regression equations (see (Mitchell, 1997) as an introduction to machine learning, and (Hair et al., 1998) as an introduction to statistics text). Different agents may produce models in the different forms, which requires methods for both evaluating them and combining them. When multiple agents produce multiple models as the result of data mining effort a process for evaluating their relative merits must be established. Such processes are well defined in Data Mining challenge problems (e.g. (Srinivasan et al., 1999,Page and Hatzis, 2001)). For example a challenge recipe for the production of classificatory models can be found in (Moyle and Srinivasan, 2001). To ensure accurate comparisons, models built by different agents must be evaluated in exactly the same way, on the same data. This sounds like an obvious statement, but agents can easily make adjustments to their copy of the data to suit their particular approaches, without making the changes available to the other agents. This makes any model evaluation ad comparison extremely difficult. Furthermore, the evaluation criterion or criteria (there may be several) deemed most appropriate may change during the knowledge discovery process. For instance, at some point one may wish to redefine the data set on which models are evaluated (e.g. because it is found that it contains outliers that make the evaluation procedure inaccurate) and re-evaluate previously built models. In (Blockeel and Moyle, 2002) it is discussed how this evaluation and re-evaluation leads to significant extra efforts for the different agents and consequently is a barrier to the knowledge discovery process, unless adequate software support is provided. 54 Collaborative Data Mining 1035 One approach to control model evaluation is to centralize the process. Consider an ab- stracted Data Mining process where agents first tune their modeling algorithm (which outputs the algorithm and its parameter settings, I), before building a final model (which is output as M). The agent then uses the model to predict the labels on a test set (producing predictions, P), from which an overall evaluation of the model (resulting in a score S) is determined. The point at which these outputs are published for all agents to access depend on the architecture of the evaluation system as shown in Figure 54.2. A single evaluation agent provides the evaluation procedures; different agents submit information on their models to this agent, which stores this information and automatically evaluates it according to all relevant criteria. If criteria change, the evaluation agent automatically re-evaluates previously submitted models. In such a framework information about produced models can be submitted at several lev- els, as illustrated in Figure 54.2. Agents can run their own models on a test set and send only predictions to the evaluation agent (assuming evaluation is based on predictions only), they can submit descriptions of the models themselves, or even just send a complete description on the model producing algorithm and the used parameters to the evaluation agent which has been augmented with modeling algorithms. These respective options offer increased centralization and increasingly flexible evaluation possibilities, but also involve increasingly sophisticated software support (Blockeel and Moyle, 2002). Communicating Data Mining models to the evaluation agent can be performed using a standard format. For instance in (Flach et al., 2003) models from multiple agents were submitted in a standard, XML style, format (using the standard Predictive Markup Modeling Language (PMML) (The Data Mining Group, 2003)). Such a procedure has been adopted for a real-world collaborative Data Mining project (Flach et al., 2003). Model combination is not always possible. However, when restricted to binary-classificatory models it is possible to utilize Receiver Operating Characteristic (Provost and Fawcett, 2001) curves to assist both model comparison, and model combination. ROC analysis plots different binary-classification models on a two dimensional space with respect to the type of errors the models make – false positive errors, and false negative errors 2 . The actual performance of a model at run-time depends on the costs of errors at run-time, and the distribution of the classes at run-time. The values of these run-time parameters – or operating characteristics – determine the optimal model(s) for use in prediction. ROC analysis enables models to be compared, which may result in some models never being optimal under any operating conditions and can be discarded. The remaining models are those that are located on the ROC convex hull (ROCCH). As well as determining non-optimal models, ROC analysis can be used to combine models. One method is to use more two adjacent models on the ROCCH that are located either side of the operating condition in combination to make run-time predictions. Another approach to using ROCCH is to modify a single model into multiple models, that then can be plotted in ROC space (Flach et al., 2001) resulting in models that fit a broader range of operating conditions. (Wettschereck et al., 2003) describe a support system that performs model evaluation, model visualization, and model comparison, which has been applied in a collaborative Data Mining setting (Flach et al., 2003). 2 The axes on an ROC curve are actually the true positive rate versus the false positive rate. 1036 Steve Moyle Fig. 54.2. Two different architectures for model evaluation. The path finishing in dashed arrows depicts agents in charge of building and evaluating their own models before publishing their results centrally. The path of solid arrows depicts Data Mining agents submitting their models to a centralized evaluation agent which provides the services of executing submitted models on a test set, evaluating the predictions to produce scores, and then publishing the results. The information submitted to the central evaluation agent is: I=algorithm and parameter settings to produce models; M=models; P=predictions made by the models on a test set; S=scores of the value of the models. 54.5 Discussion References containing the keywords: collaborative Data Mining collaboration partition natu- rally into the following categories. • Multiple software agents applying Data Mining algorithms to solve the same problem: (e.g. (Ramakrishnan, 2001)) this presupposes that the Data Mining task and its associated data are well defined a priori. • Humans using modern collaboration techniques to apply Data Mining to a single, defined problem (e.g. (Mladenic et al., 2003)). • Data Mining the artifacts of human collaboration: (e.g. (Biuk-Aghai and Simoff, 2001)) these artifacts are typically the conversations and associated documents collected via some electronic based discussion forum. • The collaboration process itself resulting in increased knowledge: a form of knowledge growth by collection within a context. • Grid style computing facilities collaborating to provide resources for Data Mining: (e.g. (Singh et al., 2003)) these resources are typically providing either federated data or dis- tributed computing power. • Integrating Data Mining techniques into business process software: (e.g. (McDougall, 2001)) for example Enterprise Resource Planning systems, and groupware. Note that this, too, implies a priori knowledge of what the Data Mining problems are to be solved. 54 Collaborative Data Mining 1037 This chapter focused mainly on the second item – that of humans using collaboration techniques to apply Data Mining to a single task. With sufficient definition of a particular Data Mining problem, this can lead to a multiple software agent Data Mining framework (the first item), although this is not the aim of this chapter. Many Data Mining challenges have been issued, which by their nature always result in “winners” and “losers”. However, in collaborative approaches, much can be learned from the losers as the Data Mining projects proceed. Much initial effort is required to establish a Data Mining challenge (e.g. problem specification, data collection and preprocessing, specification of evaluation criteria) – even before the participants register. This effort also needs to be expended in a collaborative setting so that the objectives of the Data Mining project are clearly articulated in advance. The application of the collaborative methodology and techniques described here has been performed with mixed success in the data mining projects (Flach et al., 2003,Stepnkov et al., 2003, Jorge et al., 2003). More development of collaborative Data Mining processes and supporting tools and communication environments are likely to improve the results of harnessing dispersed Data Mining expertise. 54.6 Conclusions Collaborative Data Mining is more difficult the single team setting. Data mining benefits from adhering to established processes. One key notion in Data Mining methodologies is that of understanding (e.g. CRISP-DM contains the phases, business understanding and data understanding). How are such understandings produced, articulated, maintained, and communicated to all collaborating agents? What happens when understandings change – how much of the data mining process will need re-work? How does one agent’s understanding differ from another, simply due to communication, language and cultural differences? Practitioners embarking on collaborative Data Mining might wish to heed some of the lessons learned from other collaborative Data Mining projects: • Analyze the form of collaboration proposed and understand how difficult it is likely to be. • Establish a methodology that all participants can utilize along with support tools and tech- nologies. • Ensure that all results – intermediate or otherwise – are recorded, and shared in a timely manner. • Encourage competition among participants. • Define metrics for success at all stages. • Define model evaluation and combination procedures. References Adriaans, P., and Zantinge, D., Data Mining. Addison-Wesley, New York, 1996. Amara, R., New directions for innovations. Futures 53-22(2): p. 142 - 152, 1990. Bacon, F., Novum Organum, eds. P. Urbach and J. Gibson. Open Court Publishing Company, 1994. Biuk-Aghai, R.P. and S.J. Simoff. An integrative framework for knowledge extraction in collaborative virtual environments.InThe 2001 International ACM SIGGROUP Con- ference on Supporting Group Work. Boulder, Colorado, USA, 2001. 1038 Steve Moyle Blockeel, H. and S.A. Moyle. Collaborative Data Mining needs centralised model evaluation.InProceedings of the ICML-2002 Workshop on Data Mining Lessons Learned. The University of New South Wales, Sydney, 2002. Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., and Wirth, R. CRISP-DM 1.0: Step-by-step data mining guide. The CRISP-DM consortium, 2000. Edvinsson, L. and Malone, M.S. Intellectual Capital: Realizing Your Company’s True Value by Finding Its Hidden Brainpower. HarperBusiness, New York, USA, 1997. Fayyad, U., et al., eds. Advances in Knowledge Discovery and Data Mining. MIT Press, 1996. Flach, P.A., et al., Decision support for Data Mining: introduction to ROC analysis and its application. In Data Mining and Decision Support: Integration and Collaboration, D. Mladenic, et al., editors. Kluwer Academic Publishers, 2003. Flach, P., Blockeel, H., Gaertner, T., Grobelnik, M., Kavsek, B., Kejkula, M., Krzywania, D., Lavrac, N., Mladenic, D., Moyle, S., Raeymaekers, S., Rauch, J., Ribeiro, R., Sclep, G., Struyf, J., Todorovski, L., Torgo, L., Wettsc -hereck, D., and Wu, S. On the road to knowledge: mining 21 years of UK traffic acci- dent reports, In Data Mining and Decision Support: Integration and Collaboration, D. Mladenic, et al., editors. Kluwer Academic Publishers, 2003. Hair, J.F., Anderson, R.E., Tatham, R.L., and Black, W.C. Multivariate Data Analysis. Pren- tice Hall, 1998. Holte, R.C., Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning, 1993. 53-3: p. 63-91. Jorge, J., Alves, M.A., Grobelnik, M., Mladenic, D., and Petrak, J. Web site access analysis for a national statistical agency. In Data Mining and Decision Support: Integration and Collaboration, D. Mladenic, et al., editors, p. 157 – 166. Kluwer Academic Publishers, 2003. Kuhn, T.S., The structure of scientific revolutions. 2nd, enlarged ed. 1962, University of Chicago Press, Chicago, 1970. McDougall, P., Companies that dare to share information are cashing in on new opportuni- ties. InformationWeek, May 7, 2001. McKenzie, J. and C. van Winkelen. Exploring E-collaboration Space. In the proceedings of The first annual Knowledge Management Forum Conference. Henley Management College, 2001. Mitchell, T. Machine Learning. Department of Computer Science, Carnegie Mellon Univer- sity. McGraw-Hill Book Company, Pittsburgh, 1997. Mladenic, D., Lavrac, N., Bohanec, M., and Moyle, S. editors. Data Mining and Decision Support: Integration and Collaboration. Kluwer Academic Publishers, 2003. Mowshowitz, A., Virtual Organization. Communications of ACM, 53-40(9): p. 30 - 37. 1997. Moyle, S. A., Srinivasan A., Classificatory challenge-Data Mining: a recipe. Informatica 53-25(3): p. 343–347. 2001. Moyle, S., J. McKenzie, and A. Jorge, Collaboration in a Data Mining virtual organization. In Data Mining and Decision Support: Integration and Collaboration, D. Mladenic, et al., editors. Kluwer Academic Publishers, 2003. Nohria, N. and R.G. Eccles, eds. Network and organizations; structure form and action. Harvard Business School Press, Boston, 1993. Page, C.D. and C. Hatzis, KDD Cup 2001. University of Wisconsin, http://www.cs.wisc.edu/˜dpage/kddcup2001/, 2001. Popper, K. The Logic of Scientific Discovery. Routledge, 1977. 54 Collaborative Data Mining 1039 Provost, F. and T. Fawcett. Robust Classification for Imprecise Environments. Machine Learning 53-42: p. 203-231, 2001. Ramakrishnan., R. Mass Collaboration and Data Mining (keynote address).InThe Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001). San Francisco, California, 2001. Singh, R., Leigh, J., DeFanti, T.A., and Karayannis F. TeraVision: a High Resolution Graph- ics Streaming Device for Amplified Collaboration Environments. Journal of Future Gen- eration Computer Systems (FGCS). 53-19(6): p. 957-972, 2003. Snow, C.C., S.A. Snell, and S.C. Davison. Using transnational teams to globalize your company. Organizational Dynamics 53-24(4): p. 50 - 67, 1996. SolEuNet. The Solomon European Netowrk – Data Mining and Decision Support for Busi- ness Competitiveness: A European Virtual Enterprise. http://soleunet.ijs.si/, 2002. Soukhanov, A., ed. Microsoft Encarta College Dictionary: The First Dictionary for the In- ternet Age. St. Martin’s Press, 2001. A. Srinivasan, R.D. King, and D.W. Bristol. An assessment of submissions made to the Pre- dictive Toxicology Evaluation Challenge.InProceedings of the Sixteenth International Conference on Artificial Intelligence (IJCAI-99). Morgan Kaufmann, Los Angeles, CA, 1999. Stepnkov, O., J. Klma, and P. Mikovsk. Collaborative Data Mining with RAMSYS and Suma- tra TT: Prediction of resources for a health farm.InData Mining and Decision Support: Integration and Collaboration, D. Mladenic, et al., editors. p. 215 – 227. Kluwer Aca- demic Publishers, 2003. The Data Mining Group, The Predictive Model Markup Language (PMML). http://www.dmg.org/, 2003. Vo, A., Richter, G., Moyle, S., Jorge, A. Collaboration support for virtual data mining enterprises.In3rd International Workshop on Learning Software Organizations (LSO’01). Springer-Verlag, 2001. Wettschereck, D., A. Jorge, and S. Moyle. Visaulisation and Evaluation Support of Knowl- edge Discovery through the Predictive Model Markup Language.In7th International Knowledge-Based Intelligent Information and Engineering Systems (KES 2003), Ox- ford. Springer-Verlag, 2003. Wilson, T.D. The nonsense of knowledge management. Information Research 53-8(1), 2002. Witten, I.H. and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, 2000. . Advances in Knowledge Discovery and Data Mining. MIT Press, 1996. Flach, P.A., et al., Decision support for Data Mining: introduction to ROC analysis and its application. In Data Mining and Decision. Classificatory challenge -Data Mining: a recipe. Informatica 53 -25 (3): p. 343–347. 20 01. Moyle, S., J. McKenzie, and A. Jorge, Collaboration in a Data Mining virtual organization. In Data Mining and Decision. attempt to standardise the process of Data Mining. In CRISP-DM, six interrelated phases are used to describe the Data Mining process: business understanding, data understanding, data preparation,

Data Mining and Knowledge Discovery Handbook, 2 Edition part 106 ppt

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan