investigative data mining for security and criminal detection 2003

479 338 0
investigative data mining for security and criminal detection 2003

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Investigative Data Mining for Security and Criminal Detection by Jesus Mena ISBN:0750676132 Butterworth Heinemann © 2003 (452 pages) This text introduces security professionals, intelligence and law enforcement analysts, and criminal investigators to the use of data mining as a new kind of investigative tool, and outlines how data mining technologies can be used to combat crime. Table of Contents Investigative Data Mining for Security and Criminal Detection Introduction Chapter 1 - Precrime Data Mining Chapter 2 - Investigative Data Warehousing Chapter 3 - Link Analysis: Visualizing Associations Chapter 4 - Intelligent Agents: Software Detectives Chapter 5 - Text Mining: Clustering Concepts Chapter 6 - Neural Networks: Classifying Patterns Chapter 7 - Machine Learning: Developing Profiles Chapter 8 - NetFraud: A Case Study Chapter 9 - Criminal Patterns: Detection Techniques Chapter 10 - Intrusion Detection: Techniques and Systems Chapter 11 - The Entity Validation System (EVS): A Conceptual Architecture Chapter 12 - Mapping Crime: Clustering Case Work Appendix A - 1,000 Online Sources for the Investigative Data Miner Appendix B - Intrusion Detection Systems (IDS) Products, Services, Freeware, and Projects Appendix C - Intrusion Detection Glossary Appendix D - Investigative Data Mining Products and Services Index List of Figures List of Tables Back Cover Investigative Data Mining for Security and Criminal Detection is the first book to outline how data mining technologies can be used to combat crime in the 21st century. It introduces security managers, law enforcement investigators, counter-intelligence agents, fraud specialists, and information security analysts to data mining techniques and shows how they can be used as investigative tools. Readers will learn how to search public and private databases and networks to flag potential security threats and root out criminal activities even before they occur. This groundbreaking book reviews the latest data mining technologies including intelligent agents, link analysis, text mining, decision trees, self-organizing maps, machine learning, and neural networks. Using clear, understandable language, it explains the application of these technologies in such areas as computer and network security, fraud prevention, crime prevention, and national defense. International case studies throughout the book further illustrate how these technologies can be used to aid in crime prevention. The book will also serve as an indispensable resource for software developers and vendors as they design new products for the law enforcement and intelligence communities. Key Features: Introduces cutting-edge technologies in evidence gathering and collection, using clear, non-technical language Illustrates current and future applications of data mining tools in preventative law enforcement, homeland security, and other areas of crime detection and prevention Shows how to construct predictive models for detecting criminal activity and for behavioral profiling of perpetrators Features numerous Web links, vendor resources, case studies, and screen captures illustrating the use of artificial intelligence (AI) technologies About the Author Jesús Mena is a data mining consultant and a former artificial intelligence specialist for the Internal Revenue Service (IRS) in the U.S. He has over 15 years of experience in the field and is the author of the best-selling Data Mining Your Website and WebMining for Profit . His articles have been widely published in key publications in the information technology, Internet, marketing, and artificial intelligence fields. Investigative Data Mining for Security and Criminal Detection Jesús Mena An imprint of Elsevier Science www.bh.com Amsterdam • Boston • London • New York • Oxford • Paris • San Diego San Francisco • Singapore • Sydney • Tokyo Copyright © 2003, Elsevier Science (USA). All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. All trademarks found herein are property of their respective owners. Recognizing the importance of preserving what has been written, Elsevier Science prints its books on acid-free paper whenever possible. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress. ISBN: 0-7506-7613-2 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. The publisher offers special discounts on bulk orders of this book. For information, please contact: Manager of Special Sales Elsevier Science 200 Wheeler Road Burlington, MA 01803 Tel: 781-313-4700 Fax: 781-313-4882 For information on all Butterworth Heinemann publications available, contact our World Wide Web home page at: http://www.bh.com . 10 9 8 7 6 5 4 3 2 1 Printed in the United States of America To Deirdre Introduction During congressional hearings regarding the intelligence failures of the 9/11 attacks, FBI director Robert S. Mueller indicated that the primary problem the top law enforcement agency in the world had was that it focused too much on dealing with crime after it had been committed and placed too little emphasis on preventing it. The director said the bureau has been too involved in investigating, and not involved enough in analyzing the information its investigators gathered—which is what this book is specifically about: the prevention of crime and terrorism before it takes place (precrime), using advanced data mining technologies, tools, and techniques. The FBI director went on to tell Congress that the bureau would shift its focus from reacting to crime to preventing it, acknowledging that this could be done only with better technology, which, again, is what this book is about, specifically: Data integration for access to multiple and diverse sources of information Link analysis for visualizing criminal and terrorist associations and relations Software agents for monitoring, retrieving, analyzing, and acting on information Text mining for sorting through terabytes of documents, Web pages, and e-mails Neural networks for predicting the probability of crimes and new terrorist attacks Machine-learning algorithms for extracting profiles of perpetrators and graphical maps of crimes This book strives to explain the technologies and their applications in plain English, staying clear of the math, and instead concentrating on how they work and how they can be used by law enforcement investigators, counter-intelligence and fraud specialists, information technology security personnel, military and civilian security analysts, and decision makers responsible for protecting property, people, systems, and nations—individuals who may have experience in criminology, criminal analysis, and other forensic and counter-intelligence techniques, but have little experience with data and behavioral analysis, modeling, and prediction. Whenever possible, case studies are provided to illustrate how data mining can be applied to precrime. Ironically, a week after this manuscript was submitted to the publisher, this headline appeared in Federal Computer Week : "Investigative Data Mining Part of Broad Initiative to Fight Terrorism" (June 3, 2002). The story went on to announce: The FBI has selected 'investigative data warehousing' as a key technology to use in the war against terrorism. The technique uses data mining and analytical software to comb vast amounts of digital information to discover patterns and relationships that indicate criminal activity. Investigative data mining in an increasingly digital and networked world will become crucial in the prevention of crime, not only for the bureau, but also for other investigators and analysts in private industry and government, where the focus will be on more and better analytical capabilities, combining the intelligence of humans and machines. The precision of this type of data analysis will ensure that the privacy and security of the innocent are protected from intrusive inquiries. This is the first book on this new type of forensic data analysis, covering its technologies, tools, techniques, modus operandi, and case studies—case studies that will continue to be developed by innovative investigators and analysts, from whom I would like to hear at: < mail@jesusmena.com > Data mining and information sharing techniques are principal components of the White House's national strategy for homeland security. Chapter 1: Precrime Data Mining 1.1 Behavioral Profiling With every call you make on your cell phone and every swipe of your debit and credit cards, a digital signature of when, what, and where you call or buy is incrementally built every second of every day in the servers of your credit card provider and wireless carrier. Monitoring the digital signatures of your consumer DNA-like code are models created with data mining technologies, looking for deviations from the norm, which, once spotted, instantly issue silent alerts to monitor your card or phone for potential theft. This is nothing new; it has been taking place for years. What is different is that since 9/11, this use of data mining will take an even more active role in the areas of criminal detection, security, and behavioral profiling. Behavioral profiling is not racial profiling, which is not only illegal, but a crude and ineffective process. Racial profiling simply does not work; race is just too broad a category to be useful; it is one- dimensional. What is important, however, is suspicious behavior and the related digital information found in diverse databases, which data mining can be used to analyze and quantify. Behavioral profiling is the capability to recognize patterns of criminal activity, to predict when and where crimes are likely to take place, and to identify their perpetrators. Precrime is not science fiction; it is the objective of data mining techniques based on artificial intelligence (AI) technologies. The same data mining technologies that have been used by marketers to provide personalization, which is the exact placement of the right offer to the right person at the right time, can be used for providing the right inquiry to the right perpetrators at the right time, before they commit crimes. Investigative data mining is the visualization, organization, sorting, clustering, segmenting, and predicting of criminal behavior, using such data attributes as age, previous arrests, modus operandi, type of building, household income, time of day, geo code, countries visited, housing type, auto make, length of residency, type of license, utility usage, IP address, type of bank account, number of children, place of birth, average usage of ATM card, number of credit cards, etc.; the data points can run into the hundreds. Precrime is the interactive process of predicting criminal behavior by mining this vast array of data, using several AI technologies: Link analysis for creating graphical networks to view criminal associations and interactions Intelligent agents for retrieving, monitoring, organizing, and acting on case-related information Text mining for examining gigabytes of documents in search of concepts and key words Neural networks for recognizing the patterns of criminal behavior and anticipating criminal activity Machine-learning algorithms for extracting rules and graphical maps of criminal behavior and perpetrator profiles 1.2 Rivers of Scraps "It's not going to be a cruise missile or a bomber that will be the determining factor," Defense Secretary Donald Rumsfeld said over and over in the days following September 11. "It's going to be a scrap of information." Make that multiple scraps, millions of them, flowing in a digital river of information at the speed of light from servers networked across the planet. Rumsfeld is right: the landscape of battle has changed forever and so have the weapons—if commercial airliners can become missiles. So also has how we use one of the most ethereal technologies of all human creativity and imagination: AI. AI in the form of text-mining robots scanning and translating terabyte databases able to detect deception, 3-D link analysis networks correlating human associations and interpersonal interactions, biometric identification devices monitoring for suspected chemicals, powerful pattern recognition neural networks looking for the signature of fraud, silent intrusion detection systems monitoring keystrokes, autonomous intelligent agent software retrieving e-mails able to sense emotions, real-time machine-learning profiling systems sitting in chat rooms: all of these are bred from (and fostering) a new type of alien intelligence. These are the weapons and tools for criminal investigations of today and tomorrow, whether we like it or not. Which of the 1.5 million people who cross U.S. borders each day is the courier for a smuggling operation? Which respected merchant on ebay.com is about to abandon successful auction bidders, skipping out with hundreds of thousands of dollars? What tiny shred of the world's $1.5 trillion in daily foreign exchange transactions is the payment from an al-Qaeda cell for a loose Russian nuke? How many failed passwords attempts to log into a network are a sign of an organized intrusion attack? Finding the needles in these types of moving haystacks and the answers to these kinds of questions is where data mining can be used to anticipate crimes and terrorist attacks. 1.3 Data Mining Data mining is the fusion of statistical modeling, database storage, and AI technologies. Statisticians have been using computers for decades as a means to prove or disprove hypotheses on collected data. In fact, one of the largest software companies in the world "rents" its statistical programs to nearly every government agency and major corporation in the United States: SAS. Linear regressions and other types of modeling analyses are common and have been used in everything from the drug approval process by the Food and Drug Administration to the credit rating of individuals by financial service providers. Another element in the development of data mining is the increasing capacity for data storage. In the 1970s, most data storage depended upon COBOL programs and storage systems not conducive to easy data extraction for inductive data analysis. Today, however, organizations can store and query terabytes of information in sophisticated data warehouse systems. In addition, the development of multidimensional data models, such as those used in a relational database, has allowed users to move from a transactional view of customers to a more dynamic and analytical way of marketing and retaining their most profitable clients. However, the final element in data mining's evolution is with AI. During the 1980s machine-learning algorithms were designed to enable software to learn; genetic algorithms were designed to evolve and improve autonomously; and, of course, during that decade, neural networks came into acceptance as powerful programs for classification, prediction, and profiling. During the past decade, intelligent agents were developed that were able to incorporate autonomously all of these AI functions and use them to go out over networks and the Internet to scrounge the planet for information its masters programmed them to retrieve. When combined, these AI technologies enable the creation of applications designed to listen, learn, act, evolve, and identify anything from a potentially fraudulent credit card transaction to the detection of tanks from satellites, and, of course, now more then ever, to prevent potential criminal activity. As a result of these developments, data mining flowered during the late 1990s, with many commercial, medical, marketing, and manufacturing applications. Retail companies eagerly applied complex analytical capabilities to their data to increase their customer base. The financial community found trends and patterns to predict fluctuations in stock prices and economic demand. Credit card companies used it to target their offerings, microsegmenting their customers and prospects, maneuvering the best possible interest rates to maximize their profits. Telecommunication carriers used the technology to develop "churn" models to predict which customers were about to jump ship and sign with one of their wireless competitors. The ultimate goal of data mining is the prediction of human behavior, which is by far its most common business application; however, this can easily be modified to meet the objective of detecting and deterring criminals. These and many more applications have demonstrated that rather than requiring a human to attempt to deal with hundreds of descriptive attributes, data mining allows the automatic analysis of databases and the recognition of important trends and behavioral patterns. Increasingly, crime and terror in our world will be digital in nature. In fact, one of the largest criminal monitoring and detection enterprises in the world is at this very moment using a neural network to look for fraud. The HNC Falcon system uses, in part, a neural network to look for patterns of potential fraud in about 80% of all credit card transactions every second of every day. Likewise, analysts and investigators will come to rely on machines and AI to detect and deter crime and terrorism in today's world. Breakthrough applications are already taking place in which neural networks are being used for forensic analysis of chemical compounds to detect arson and illegal drug manufacturing. Coupled with agent technology, sensors can be deployed to detect bioterrorism attacks. The Defense Advanced Research Projects Agency (DARPA) has already solicited a prototype for such a system. 1.4 Investigative Data Warehousing Data warehousing is the practice of compiling transactional data with lifestyle demographics for constructing composites of customers and then decomposing them via segmentation reports and data mining techniques to extract profiles or "views" of who they are and what they value. Data warehouse techniques have been practiced for a decade in private industry. These same techniques so far have not been applied to criminal detection and security deterrence; however, they well could be. Using the same approach, behavioral data from such diverse sources as the Internet (clickstream data captured by Internet mechanisms, such as cookies, invisible graphics, registration forms); demographics from data providers, such as ChoicePoint, CACI, Experian, Acxiom, DataQuick; and utility and telecom usage data, coupled with criminal data, could be used to construct composites representing views of perpetrators, enabling the analysis of similarities and traits, which through data mining could yield predictive models for investigators and analysts. As with private industry, better views of perpetrators could be developed, enabling the detection and prevention of criminal and terrorist activity. 1.5 Link Analysis Effectively combining multiple sources of data can lead law enforcement investigators to discover patterns to help them be proactive in their investigations. Link analysis is a good start in mapping terrorist activity and criminal intelligence by visualizing associations between entities and events. Link analyses often involve seeing via a chart or a map the associations between suspects and locations, whether by physical contacts or communications in a network, through phone calls or financial transactions, or via the Internet and e-mail. Criminal investigators often use link analysis to begin to answer such questions as "who knew whom and when and where have they been in contact?" Intelligence analysts and criminal investigators must often correlate enormous amounts of data about individuals in fraudulent, political, terrorist, narcotics, and other criminal organizations. A critical first step in the mining of this data is viewing it in terms of relationships between people and organizations under investigation. One of the first tasks in data mining and criminal detection involves the visualization of these associations, which commonly involves the use of link-analysis charts ( Figure 1.1 ). Figure 1.1: A link analysis can organize views of criminal associations. Link-analysis technology has been used in the past to identify and track money-laundering transactions by the U.S. Department of the Treasury, Financial Crimes Enforcement Network (FinCEN). Link analysis often explores associations among large numbers of objects of different types. For example, an antiterrorist application might examine relationships among suspects, including their home addresses, hotels they stayed in, wire transfers they received and sent, truck or flight schools attended, and the telephone numbers that they called during a specified period. The ability of link analysis to represent relationships and associations among objects of different types has proven crucial in helping human investigators comprehend complex webs of evidence and draw conclusions that are not apparent from any single piece of information. 1.6 Software Agents Another AI technology that can be deployed to combat crime and terrorism is the use of intelligent agents for such tasks as information retrieval, monitoring, and reporting. An agent is a software program that performs user-delegated tasks autonomously; for example, an agent can be set up to retrieve information on individuals or companies via the Web or proprietary secured networks. An agent can be assigned tasks, such as compiling a dossier, interpreting its findings, and, following instruction, to act on those findings by issuing predetermined alerts. For example, agent technology is increasingly being used in the area of intrusion detection, for monitoring systems and networks and deterring hacker attacks. An agent is composed of three basic abilities: Performing tasks: They do information retrieval, filtering, monitoring, and reporting. 1. Knowledge: They can use programmed rules, or they can learn new rules and evolve. 2. Communication skills: They have the ability to report to humans and interact with other agents. 3. Over the past few years, agents have emerged as a new paradigm: they are in part distributed systems, autonomous programs, and artificial life. The concept of agents is an outgrowth of years of research in the fields of AI and robotics. They represent the concepts of reasoning, knowledge representation, and autonomous learning. Agents are automated programs and provide tools for integration across multiple applications and databases running across open and closed networks. They are a means of managing the retrieval, dissemination, and filtering of information, especially from the Internet. Agents represent new type of computing systems and are one of the more recent developments in the field of AI. They can monitor an environment and issue alerts or go into action, all based on how they are programmed. For the investigative data miner, they can serve the function of software detectives, monitoring, shadowing, recognizing, and retrieving information on suspects for analysis and case development ( Figure 1.2 ). Figure 1.2: Software agents can autonomously monitor events. Intelligent agents can be used in conjunction with other data mining technologies, so that, for example, an agent could monitor and look for hidden relationships between different events and their associated actions and at a predefined time send data to an inference system, such as a neural network or machine-learning algorithm, for analysis and action. Some agents use sensors that can read identity badges and detect the arrival and departure of users to a network, based on the observed user actions and the duration and frequency of use of certain applications or files. A profile can be created by another component of agents called actors, which can also query a remote database to confirm access clearance. These agent sensors and actor mechanisms can be used over the Internet or other networks to monitor individuals and report on their activities to other data mining models which can issue alerts to security, law enforcement, and other regulatory personnel. [...]... commercial and private databases and networks Data mining has traditionally been used to predict consumer behavior, but the same tools and techniques can also be used to detect and validate the identity of criminals for security purposes These data mining techniques will herald a new method of validating individuals for security applications over the Internet and proprietary networks and databases The need for. .. enhance security; and discover, detect, and deter unlawful and dangerous entities In the twenty-first century, investigators must begin to use advanced patternrecognition technologies to protect society and civilization Analysts need to use data mining techniques and tools to stem the flow of crime and terror and enhance security against individuals, property, companies, and civilized countries 1.1 2Criminal. .. 1.1 2Criminal Analysis and Data Mining Data mining is a process that uses various statistical and pattern-recognition techniques to discover patterns and relationships in data It does not include business intelligence tools, such as query and reporting tools, on-line analytic processing (OLAP), or decision support systems Those tools report on data and answer predefined questions, whereas data mining tools focus... of data mining for detection and deterrence A similar understanding of the environment and the targets of crime can be applied to other situations, so that rather than a building, we might perform a criminal analysis inventory of an e-commerce Web site for illegal hacking intrusions into a server The next phase of this type of criminal analysis is to use data mining, given the fact that a security. .. where data mining techniques can be used to transform vast amounts of data generated from multiple sources in order for investigators and analysts to take preventive action to discover, detect, and deter crime and terror Data mining tools can enable them to use quantifiable observations to construct predictive models in order to identify threats and assess the probability of crimes and attacks rapidly and. .. courts are expanded The new act also updates wiretapping laws to keep up with changing technologies, such as cell phones, voicemail, and e-mail Coupled with data mining techniques, this expanded ability to access multiple and diverse databases will allow the expanded ability to predict crime Security and risk involving individuals, property, and nations involves probabilities that data mining models... this information exits: it is sitting idly in the government databases from the Social Security Administration and the Departments of State, Transportation, and the Treasury Obviously the future of homeland security is going to require the application of data mining models in realtime, utilizing many different databases in support of multiple agencies and their personnel Already the Visa Entry Reform... requiring some investigative heuristics, so is data mining The data is the evidence, but some skill is required to extract a model or rules from the raw records A methodology exists for data extraction, preparation, enhancement, and mining; however, it is a skill not a science As with deductive profiling, no two criminals are exactly alike, and neither are the profiles or MOs constructed from data mining analyses... Internet, and wireless fraud; and money laundering, where investigators and analysts must deal with large volumes of transactions in large databases Data mining has traditionally been used to predict consumer preferences and to profile prospects for products and services; however, in the current environment, there is a compelling need to use this same technology to discover, detect, and deter criminal. .. trails that investigators can track and analyze There is an assortment of tools and techniques for discovering key information concepts from narrative text residing in multiple databases in many formats and multiple languages Text mining tools and applications focus on discovering relationships in unstructured text and can be applied to the problem of searching and locating keywords, such as names . D - Investigative Data Mining Products and Services Index List of Figures List of Tables Back Cover Investigative Data Mining for Security and Criminal Detection is the first book to outline how data mining technologies. combat crime. Table of Contents Investigative Data Mining for Security and Criminal Detection Introduction Chapter 1 - Precrime Data Mining Chapter 2 - Investigative Data Warehousing Chapter 3 - Link. Investigative Data Mining for Security and Criminal Detection by Jesus Mena ISBN:0750676132 Butterworth Heinemann © 2003 (452 pages) This text introduces security professionals,

Ngày đăng: 04/06/2014, 13:16

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan