Data analytics for beginners paul kinley

51 13 0
  • Loading ...
1/51 trang

Thông tin tài liệu

Ngày đăng: 23/11/2016, 14:50

Data Analytics is the most powerful tool to analyze today’s business environment and to predict future developments.Is it not the dream of every business owner to know exactly what the customer will buy in 6 months or what the new product hype will look like in your OWN industry?Data Analytics is the tool that will bring you answers to these questions.Here’s why Data Analytics for Beginners will bring your business to a complete new level: How you can use data analytics to improve your business How to plan data analysis to know exactly what your target group wants How to implement descriptive analysis You will learn the exact techniques that are required to master Data Analytics Data Analytics for Beginners Basic Guide to Master Data Analytics Table of Contents: Introduction Chapter 1: Overview of Data Analytics Foundations Data Analytics Getting Started Mathematics and Analytics Analysis and Analytics Communicating Data Insights Automated Data Services Chapter 2: The Basics of Data Analytics Planning a Study Surveys Experiments Gathering Data Selecting a Useful Sample Avoiding Bias in a Data Set Explaining Data Descriptive analytics Charts and Graphs Chapter 3: Measures of Central Tendency Mean Median Mode Variance Standard Deviation Coefficient of Variation Drawing Conclusions Chapter 4: Charts and Graphs Pie Charts Create a Pie Chart in MS Excel Bar Graphs Create a Bar Graph with MS Excel Customizing the Bar Graph Time Charts and Line Graphs Create a Line Graph in MS Excel Customizing Your Chart Annual Employee Losses Adding another Set of Data Histograms Create a Histogram with MS Excel Creating a Histogram Scatter Plots Create a Scatter Chart with MS Excel Spatial Plots and Maps Chapter 5: Applying Data Analytics to Business and Industry Business Intelligence (BI) Data Analytics in Business and Industry BI and Data Analytics Chapter 6: Final Thoughts on Data Conclusion Introduction We live in thrilling and innovative times As business moves to the digital environment, virtually every action we take produces data Information is collected from every online interaction All sorts of devices gather and store data about who we are, where we are, and what we are doing Increasingly-massive warehouses of data are now freely available to the public Skilled analyses of all this data can help businesses, governments, and organizations to make better-informed decisions, respond quickly to changing needs, and to gain deeper insights into our rapidly-changing environment It is a challenge to even attempt to make good use of all of the available data In order to answer specific questions, a person must decide what data to collect, which methods to use, and how to interpret the results Data analytics is a way to make valuable use all types of information Analytics is used to help categorize data, identify patterns, and predict results Data use has become so ubiquitous that it has become necessary for individuals in every profession to learn how to work with data Those who become the most proficient at working with data in useful and creative ways will be the most successful in the new world of business Until recently, data analytics was limited to an exclusive culture of data analysts, who characteristically presented this topic in complicated and often unintelligible terminology Fortunately, data analytics is not as complicated as many believe It simply consists of using analytical methods and processes to develop and explain specific and useful information from data The point of data analytics is to enhance practices and to support better-informed decisions This can result in: safer practices within an industry, greater revenues for a business, higher customer satisfaction, or any other object of focus This eBook introduces a wide range of ideas and concepts used for deriving useful information from a set of data, including data analytics techniques and what can be achieved by using them Chapter 1: Overview of Data Analytics With a little statistical understanding and procedural training, you will be able to use analytical methods to make data-based insights Data analytics offers new ways to understand the world Businesses and organizations were in the habit of making decisions based on assumptions and hoping for favorable outcomes Data analytics gives people the insights that they need to plan for improvements and specific results Analytics are generally used for the following purposes: • To enhance business organizations and increase returns on investment (ROIs) • To improve the success of sales and marketing campaigns • To identify trends and emerging developments • To make society more safe Foundations Data Analytics Data analytics requires the use mathematical and statistical procedures It also requires the skills to work with certain software applications and a knowledge of the subject area you are working with Without knowledge of the subject-matter, analytics is reduced to simple analytics Due to the increasing demand for data insights, every field of business has begun to implement data analytics This has resulted in a variety of analytic specialties, such as: market analytics, financial analytics, clinical analytics, geographical analytics, retail analytics, educational analytics, and many other areas of interest Getting Started This chapter explains the major components comprising data analytics, gathering, exploring, and interpreting data As a data analyst, you will be collecting and sorting large volumes of raw, unstructured, and partially-structured data The amounts of data that you are likely to be working with can be too large for a normal database system to effective process A data set that is too large, changes too quickly, or it does not conform to the structure of standard database designs requires a special skillset to manage Data analytics consists of analyzing, predicting, and visualizing data When data analysts gather, query, and interpret data, they conduct a process that is quite similar to data engineering Although useful insights can be produced from an individual source of data, the blending of several sources gives context to the data that is necessary to make more informed decisions As a data analyst, you can combine multiple datasets that are maintained in a single database You can also work with several different databases maintained within a large data warehouse Data can also be maintained and managed within a cloud-based platform specially designed for that purpose However the data is pooled and wherever it is stored, the analyst must still issue queries on the data and make commands to retrieve specific information This is typically done using a specialized database language called Structured Query Language (SQL) When using a database software application or conducting an analysis using other programming languages, like R or Python, you can utilize a variety of digital file formats, such as: • • • • Comma-separated values (CSV) files: Virtually all data-based software applications (including cloud-based programs) and scripting languages are compatible with the CSV file type Programming Scripts: Professional data analysts generally know how to write programming scripts in order to work with data and visualizations in languages like Python and R Common File Extensions: MS Excel files have the xls or xlsx extension Geospatial applications are saved with their own file formats (e.g., mxdextension for ArcGIS and the qgs extension for QGIS) Web Programming Files: Web-based data visualizations often use the Data Driven Documents JavaScript library (D3.js.) D3.js, files are saved as html files Mathematics and Analytics Data analytics requires the ability to perform mathematical and statistical operations These skills are necessary to understand both to make sense of the data and to evaluate its relative significance This is also important in data analytics, because they can be used to conduct data forecasting, decision analytics, and testing of hypotheses Before getting into more advanced explorations of mathematical and statistical procedures, we will take some time to explain some distinctions between mathematics and analytics Mathematics relies on specific numerical procedures and deductive reasoning to develop a mathematical explanation of some phenomenon Like mathematics, analytics provides a mathematical description of a phenomenon Analytics is actually a type of analytics that is based on mathematics However analytics uses inductive reasoning and probability to form a conclusions and explanations Data analysts use mathematical procedures to make decision models, to produce estimations, and to make forecasts In order to follow this book, you need little more than common math skills This book will teach you how to statistical techniques to develop insights from data In the field of data analytics, statistical procedures are used to determine the meaning and significance of data This can then utilized to test hypotheses, build data simulations, and make predictions about future outcomes Analysis and Analytics The major difference between data analysis and data analytics is the need for subject knowledge Typical statisticians specialize in data procedures and have little-to-no knowledge of other fields of study They must consult with others who have subject-specific expertise to know which data to look for and to help find meaning in that data Data analysts, on the other hand, must understand their subject matter They seek to gain important insights that they can use with their subject-matter expertise to make meaning of those insights Below is a list of ways that subject matter experts use analytics to enhance performance in their areas: • • • • • • Engineering analysts use data analytics with building designs Clinical data analysts use predictive methods to foresee future health issues Marketing data analysts use regression data to predict and moderate customer turnover Data journalists search databases for patterns that may be worth investigating Crime data analysts develop spatial models to identify patterns and predict future crimes Disaster relief data analysts work to organize and explain important data about the effects of disasters, which is then used to determine the types of assistance needed Communicating Data Insights Data analysts often have to explain data in ways that non-technical people can comprehend They must be able to create understandable data visualizations and reports Generally, people have to visually process data in the form of charts, graphs, and pictures for to be able to understand data Analysts have to be both creative and practical in the ways that they communicate their findings Organizational leaders often have difficulties trying to figure out what to with all of data that their organization collects What they know, however, is that effectively using analytical tools can help them to both strengthen and gain a valuable competitive edge for their business or organization Currently, very few of these leaders know the available options for engaging in the process The following section discusses the major data analytics solutions and the benefits that can be gained by organizations When implementing data analytics within an organization, there are three key methodologies One can create an internal data analytics department One could contract out the assignments to independent data analysts, or one could pay for a cloud-based software-as-a-service (SAS) solution that enables novices to utilize powerful of data analytics tools There are a few major ways to create an internal data analytics team: • Train current personnel This can be an inexpensive way to provide an organization with ongoing data analytics This training can be used to transform certain employees into highlyskilled subject-matter experts who are proficient in data analysis • Train current personnel and also hire professional analysts This strategy follows the same process as the first method, but also includes hiring a few data professionals to oversee the process and personally handle the most challenging problems and tasks • Hire data professionals An organization get their needs met by hiring or contracting with professional data analysts This is the most expensive option, because professional data analysts are in low supply and generally have high salary requirements Securing highly-skilled data analysts to meet the needs of an organization can be extremely difficult Many businesses and organizations outsource their data analytics jobs to external experts This happens in two different ways: They contract with someone to develop a wide-ranging data analytics plan to serve the entire organization Another way is to contract with experts to provide individual data analytics solutions for specific situations and problems that that their organization may encounter Automated Data Services Although you must understand some certain statistical and mathematical procedures, it is not essential to learn how to code like professional analysts Computer program applications have been developed that can help to provide powerful capabilities without having to code or script Cloud-based platform solutions can provide organizations with most or all of their data analytics needs, although training is still required for personnel to operate the cloud platform programs This book will teach you how to use the power of data analytics to achieve a individual and organizational goals Regardless of a field of work, learning data analytics can help you to become a more in proficient and sought after professional Below is a brief list of benefits that data analytics provide for various areas: • Benefits for corporations: Cost minimization, higher return on investment (ROI), increased staffproductivity, reduction of customer loss, higher customer satisfaction, sales forecasting, pricingmodel enhancement, loss detection, and more efficient processes • Benefits for governments: Increased staff-productivity, improved decision-making models, more reliable budget forecasting, more efficient resource allocations, and discovery of organizational patterns • Benefits for academia: More efficient resource allocations, improved instructional focus and student performance, increased student retention, refinement of processes, reliable budget forecasting, and increased ROI for student recruitment practices This chapter provided an introduction to the concept of data analytics Analytics is a growing field of science that brings together traditional statistical procedures and computer science in order to ascertain meaningful insights from huge sets of raw data for the benefit of businesses, organizations, governments, and society Data analytics is sometimes confused with Business Intelligence (BI) because of the common tools they both share, particularly data visualizations, such as traditional charts and graphs BI, however, is a discipline designed for business leaders without the advanced training necessary to engage in data analytics The following chapter discusses the basic principle of data analytics Chapter 2: The Basics of Data Analytics This Chapter will help you to understand the big picture of the field of analytics It will discuss the steps of the scientific method, and it will help you to learn how to apply analytics at each step of the scientific process Analytics does not only consist of analyzing data It also consists of using the scientific process to find answers to questions and make important decisions The process includes designing studies, gathering useful information, explaining the data with figures and charts, exploring the data, and drawing conclusions We will now examine each step in this process and discuss the critical role of analytics Planning a Study Once the research question is established, it is time to design a study to answer that specific question This requires figuring out the methods that you will use to extract the necessary data This section covers the two main types of studies: descriptive studies and experimental studies Surveys With a descriptive study, data are gathered from people in a way that does not have an impact on them The most widely used type of descriptive study is a survey Surveys are questionnaires that are given to people who are randomly selected from a target population Surveys are useful data tools for gathering information As with all methods of gathering data, improperly conducted surveys are likely to result in inaccurate information Common issues with surveys include inadequately worded questions, which can be confusing, lack of participant response, or lack of randomization in the selection process Any of these problems can invalidate the results of the survey, therefore surveys must be carefully planned before they are implemented A limitation of the survey method is that they can only provide information on relationships that exist between variables and not information on causes and effects If the survey researchers observe that the people who smoke cigarettes, for example, tend to work longer hours per day than those who not smoke, they are not in a position to suggest that smoking is the cause for the longer work hours Variables that were not part of the research design might cause the relationship, such as number of hours they sleep every night Experiments Experiments involve the application of one or more treatments to subjects in a controlled environment The treatments are things that may or may not affect the subject under study Some studies involve medical experiments, wherein the subjects are patients who undergo medical treatments Other experiments might include students who receive tutoring, or exposure to a particular instructional tool as the treatment Businesses engage in experiments that involve sample participants from the consumer market These participants may be exposed to a certain type of advertisement and asked how they were emotionally affected Once the treatments are applied, the responses are systematically recorded For instance, to study the effect of a drug dosage amount on blood pressure, a group of subjects may be administered 15mg of a medicine A different sample group may be administered 30 mg of the same drug Typically, a control group is also involved, where subjects each receive a placebo treatment (i.e., a substance with no medicinal properties) Experiments are often designed to take place in a controlled setting, in order to reduce the number of potential unrelated variables and possible biases that might affect the results Some possible problems might include: researchers knowing which participants received particular treatments; a particular circumstance or condition, not factored into the study, that may impact the results (e.g., other medications that a participant may be taking), or not including an experimental control group However, when experiments are designed correctly, difference in responses, found when the groups are compared, allow the researchers to conclude that there is a cause and effect relationship No matter what the study, it must be designed so that the original questions can be answered in a credible way Gathering Data Once a research plan (whether descriptive or experimental) has been designed, the subjects must be selected, and data must be gathered This stage of the research process is essential to generating meaningful data The ways in which data are collected vary with the type of study In experimental designs, the data should be collected in the most controlled manner possible, in order to reduce the possibility of generating contaminated results Some experiments require more strenuous procedures than others When gathering data on people’s perceptions of a new business marketing strategy or data concerning the effectiveness of a new teaching strategy, the consequences of inaccurate results are not as critical as they would be in a medical study Therefore, in low-stakes experiments, it is sometimes preferable to use less robust data gathering procedures in order to save time and money Selecting a Useful Sample In analytics, as with computer programming, garbage in results in garbage out If subjects are improperly chosen, for example by giving some more of a chance to be selected than others, the results will be unreliable and not useful for making decisions For example, John is researching the attitudes of individuals about a possible new tax John stands in front of a local grocery store and asks passers-by to share their thoughts and attitudes The problem with that is that John will only get the attitudes of a) individuals who shop at that grocery store; b) on that specific day; c) at that specific time; d) and who actually chose to participate Because of his limited selection process, the subjects in his survey are not representative of the entire population of the town Likewise, John design an online survey and ask people to input their feedback on the new tax However, only people who are aware of the website, have access to the Internet, and choose to participate will provide data Characteristically, only people with the strongest attitudes are likely to participate Again, these the participants would not be representative of everyone in the town In order to avoid such selection bias, it is necessary to select the sample randomly, using some type of process that gives everyone in the population the same statistical opportunity to be chosen There are various methods for randomly selecting subjects in order to get valid and useable results Avoiding Bias in a Data Set If you were conducting a phone survey on political voting preferences, and you made your calls to people’s land lines at home between the hours of 8:00 a.m and 4:00 p.m., you would fail to get feedback from individuals who work at that time Perhaps those who work during those hours have different preferences than those at home during those hours For example, more business owners may be at home and express voting preferences for something completely different than members of the working class Surveys that are poorly designed may be too lengthy, resulting in some participants quitting before they finish Participants may not be completely honest if the questions are too personal If the list of choices is too limited, the survey will not be able to capture valuable data that people would have provided Many things can render survey data invalid Distribution of Exam Grade Frequencies If a data point were to fall precisely on a between two categories, then you may want to round up or down The validity of the chart is not affected by such decisions, as long as they are consistent A Histogram displays three primary elements of numerical data: • • • The relative distribution of the data (e.g., relative to a bell-shape) The range of variance in the data (the levels of difference between categories) The central point of the data set (if you make a combination chart) A key feature of histograms is that they display the distribution on the data as a shape, which can be used to make simple inferences Of course shapes vary with each different set of data; however, there are three main shapes that are commonly looked for in a set of data: Symmetric (the left side of the histogram is the same as the right side.) Skewed Right (The left side is high and gets continually lower going right.) Skewed Left (The left side is low and gets continually higher going right.) Variability in the data from a histogram Histograms also help to illustrate levels of variability within a data set A histogram that is generally flat along the top may appear to have low variability; however, that would indicate a wide range within the data set Having the same numbers in each category means that the measures were spread out widely A hill in the center shows that the majority of measures are near the central point, with a few straying away in both directions away from the center (which is to be expected) The higher the center point, the lower the variability in the data In more advance analyses, the distance of the outliers to the left and right of the center take on greater significance Variability in a histogram is distinct from variability in a time chart When values on a time chart change over a period of time, they move either higher or lower on the chart The more highs and lows along a time chart indicate greater variability Conversely, a flat line on a time chart indicates low variability Below are considerations for evaluating a histogram: • Inspect the scale being utilized for the frequency (vertical axis) Understand that results can be made to appear less or more significant by adjusting the size of the scales For example, if a group of people have weight difference ranging from end to end by 20 pounds, this can appear to be massive by using a gram scale or insignificant by using a ton scale • Examine the units along the vertical axis to see if the graph is using frequencies (numbers) or relative frequencies (percentages) • Check the size range of the categories for the numerical variables (on the horizontal axis) If they represent very small measures, the data may appear to have excessive variation If they are very large, the chart may conceal significant amounts of variation Create a Histogram with MS Excel Installing the Analysis ToolPak Add-In In order to create a Histogram with MS Excel, you must install the Analysis ToolPak Add-in This section covers the installation process Step 1: Locate the "Excel Add-ins" box under File You can this from the MS Excel Home screen a) b) c) d) Click “Options” on the File menu Click “Add-Ins” Under “Manage,” select “Add-ins.” Click “Go.” In the Add-Ins dialog box, click on the Analysis ToolPak check box It is located under “Add-Ins Available,” Next, click “OK.” The Analysis ToolPak Add-in will not be in the dialog box if it has not been previously installed If the Analysis ToolPak is not in the dialog box, run MS Excel Setup And add the ToolPak to your list of installed items Now that the Analysis Toolpak is installed and enabled, you are ready to create a Histogram Creating a Histogram Step 1: Enter the Data Enter your data into two adjacent columns, and populate the left column with the "input data" (the set of values that you will analyze with the Histogram tool) In the right column you will place your “bin Numbers” (the segments that you use for separating and analyzing your data values) For example, in order to organize ratings into categories of Good, Better, an Best you could make bins for 1, 2, and Navigate to the Data tab, at the top of the screen, and click “Data Analysis” in the Analysis group This will start up the Analysis ToolPak Then, click to open the Data Analysis box In the Data Analysis dialog box, scroll down to “Histogram,” and click “OK.” This opens the Histogram dialog box Under “Histogram,” click the input and the bin ranges from your worksheet This is done by clicking on the input box The input range contains the data that you want to analyze If the input data is a set of 30 values, and you have copied it into the B column (from B1 to B30), then enter your data range as B1:B30 The bin range consists of the bin numbers For example, if there are bins at the very top of column C, then then your bin will be C1:C5 Place a check the Chart Output checkbox Under Output Options, click “New Workbook.” Then, place a check in the “Chart Output” check box Once you click “OK,” you are finished Excel will produce a new workbook containing a histogram table long with your chart Distribution of Book Prices Scatter Plots Scatter plots are charts that visually represent the relationship between two variables A scatterplot consists of an X axis (the horizontal axis), a Y axis (the vertical axis), and a series of dots Each dot on the scatterplot represents one observation from a data set The position of the dot on the scatterplot represents its X and Y values The example chart below displays the relationship between iPhone sales and Galaxy Note sales When we examine the number of Galaxy Note sale along the X (horizontal) axis, we see that the more Galaxy Note sales there are, the more iPhone sales there are The red trend line illustrates this relationship If the trend line were horizontal and flat, that would tell us that as Galaxy Note sales go up, iPhone sales level off A downward sloping trend line along the X axis would indicate that as Galaxy Note sale rise, iPhone sales drop This would be a possible situation within a small population (e.g 10 customers) who have to simply choose one phone or the other The dots along the trend line represent actual data points These data points give us specific information about the units being measured They also help us to see the variance in the set of data The more close the data points are to the trend line, the stronger the relationship between the two variables The more spread out they are, the weaker the relationship A weak relationship, for example, might be observed if the data were collected from a customer population that had several other types of phones to choose from, besides these two Relationship between sales of the iPhone and Galaxy Note In order to you want to examine relationships between several sets of variables within a data set, you could utilize a scatter plot matrix This is a series of scatter plots within a single graph that shows the relationships between multiple variables Identifying and proving relationships between variables enables analysts to draw important conclusions that organizational leaders can use to help them to efficiently achieve their goals Using a scatter plot is a good way to easily visualize important patterns and identify outliers in a set of data This type of graph plots data points along the X and Y axes, which helps to describe the central tendency of the data Because different charts are able to describe different characteristics of data, it is always a good idea to use multiple types of graphs and charts to explain a set of data When using a variety of graphical displays to explain a data set, it can be useful to begin with a scatter plot, because it will give you a big picture view of the data characteristics You could then follow up with a pie chart, histogram, and bar charts in order to gradually focus in on specific elements within the data set Graphs and charts allow you to tell a story about your data in a way that is accessible to non-analysts Create a Scatter Chart with MS Excel Select the data that you want to plot in the scatter chart On the Insert tab, in the Charts group, click Scatter Click “Scatter with only Markers.” Your chart will appear on your Excel worksheet If you want the trend line, right click on the chart click on “Chart Elements” and check the box marked “Trend Line.” The Relationship between McDonald’s Menu Prices and Calories Spatial Plots and Maps Two ways to represent spatial data include Spatial Plots and Maps A map is simply an image that signifies sizes, shapes, and locations, of a geographical area Spatial plots visualize the values and locations of distribution of a set of data Below are a few common types of spatial plots and maps: • Choropleth Maps: Choropleth maps are spatial data plotted out along area boundary shapes, rather than by point, line, or raster coverage For example, in a map of the U.S., state boundaries represents area boundary shapes Colors may be used within areas to signify some sort of value for an attribute being looked at in each state Perhaps red areas indicate higher values and blue areas signify smaller values • Point Maps: These are composed of spatial data plotted out along specific point locations Point maps visually display data in a graphical point form, rather than in shapes, line, or raster surface formats • Raster Surface Maps: These maps can be anything from a satellite image map to a surface coverage with values that have been included from basic spatial data points This chapter discussed the purpose and concepts behind common visual methods for displaying data Descriptive analytics uses numbers to summarize aspects of a collection of data They give you understandable information to help you answer research questions They can also help you to understand what is happening in you experiment, so that you can later conduct more in-depth analyses Visual representations of data help analysts to present data to the outside world plainly and succinctly Chapter 5: Applying Data Analytics to Business and Industry Business Intelligence (BI) The goal of business intelligence is to transform raw data into organized information that can be used to provide insights that business leaders can apply to make well-informed decisions Business data analysts rely on business intelligence (BI) tools to help them generate decision models for decision making To build data analytic dashboards, visual presentations, or data reports from collections of data, you can benefit from the use of (BI) tools to help with the process Business intelligence (BI) consists of: • Large public and private collections of data: Private collections are information sets supplied by the organization’s data collection methods • Technological tools and skillsets: This includes online analytical data procedures, database development and management, warehousing of data, and information technology (IT) for business programs and applications The types of data used in business intelligence insights that are generated in business intelligence (BI) result from standardized sets of organized business data BI solutions are primarily comprised of transactional data that is produced throughout the course of countless events, such as data created during sales, or records resulting from financial transfers among bank accounts Transactional data is natural produced by business actions occurring throughout the organization This data is critical for variety of insights that can be gathered from it BI can be used to extract the following types of business insights: • Customer Information: This data can help managers identify, for example, the areas of their business that are creating the most customer turnover • Marketing Data: This data can let businesses know the specific marketing strategies that are most effective and what exactly makes them so effective • Operational Data: This data can let business know how efficiently different departments are functioning and the best actions to take in order to fix identified problems • Employee Data: This data can let business know which employees are producing the most, and which are producing the least Because the results of data analytics are often extracted from large datasets, cloud-based data platform solutions are common in the field Data that’s used in data analytics is often derived from dataengineered big data solutions, like Hadoop, MapReduce, and Massively Parallel Processing Data analysts must be innovative, forward-thinkers who must often come up with creative solutions in order to overcome limitations in data collection and interpretation Many data analysts prefer open-source solutions Considering the free cost of open-source software and its robust development architecture, it is quite popular among analysts This benefits the organizations that employ these analysts Data Analytics in Business and Industry • • • • Transactional Data: This is the type of organized data used in most BI models It includes administration data, customer data, marketing data, organizational data, and employee productivity data Social Data: This includes the unfiltered data generated from emails and social networks, like Facebook, Twitter, LinkedIn, Pinterest, and Instagram Machine data from business operations: This data is used to monitor the organization’s equipment and machines Audio, video, image, and PDF file data: These are all well-established formats are all sources of unstructured data To streamline BI processes, you must make sure that your data is structured for maximum ease of access and control You can use multidimensional databases to accomplish your goals Unlike the popular relational, and flat databases, multidimensional databases sort data into cubes that are organized into multi-dimensional data arrays To be able to manipulate your data as rapidly and effortlessly as possible, you can place your data in multidimensional databases as a cube, rather than organizing your data among multiple relational databases that may encounter difficulties working with each other The cubic data architecture allows for online analytical processing (OLAP) OLAP is a technology with which you can conveniently access and use all of your data for all several different procedures and explorations To understand the OLAP model, imagine that you have a cube of market data with three scopes, time, location, and department You could, for example, arrange the data to examine only one rectangle, in order to view one particular department You could arrange the data to explore a proportionately smaller cube, consisting of a specific period of time, locations, and departments You could also drill up or down your data set to view very detailed data or decidedly summarized data You could also or total a range of numbers along a single dimension in order to sum up the totals for small units of business or examine sales across an extended period of time within a specific location OLAP is just merely one system for warehousing data Another data warehouse system that is popular among (BI) solutions is called a data mart This is a data management system used to store specific elements of data, fitting only one element of business in the organization The process used for extracting, changing, and resorting the data into a database or data mart is known as extract, transform, and load (ETL) Typically, business analysts are highly trained in (BI) technology As a general rule, BI training is accompanied by on traditional IT training and development Within the business world, data analytics fulfills the same function as that BI, and that is to turn mountains of raw data into useful information that can help business leaders make informed, strategic business decisions If you have large sets of unconnected data sources, that may possibly be incomplete, and you want to convert all of that into valuable and useful business insights throughout the entire organization Business data analysts produce critical data insights This is accomplished by identifying patterns and abnormalities in business data Data analytics in the business world consists of: • Quantitative Examination: This includes mathematical modeling, statistical analysis, predictive forecasting, and data simulations These processes often involve more than one variable at a time • Programming skills: You not need to have software programming skills to gather, organize, explore the data, and share this data with stakeholders • Business knowledge: Having knowledge of the particular business from a functional perspective will definitely help you to better understand the relevancy and meanings of your findings Useful Technologies and Skills Business-centric data analysts might use machine learning techniques to find patterns in (and derive insights from) huge datasets that are related to a line of business or the business at large They’re skilled in math, analytics, and programming, and they sometimes use these skills to generate predictive models They generally know how to program in Python or R Most of them know how to use SQL to query relevant data from structured databases They are usually skilled at communicating data insights to end users — in business-centric data analytics, end users are business managers and organizational leaders Data analysts must be skillful at using verbal, oral, and visual means to communicate valuable data insights Although business-centric data analysts serve a decision-support role in the enterprise, they’re different from the business analyst in that they usually have strong academic and professional backgrounds in math, analytics, engineering, or all of the above This said, business-centric data analysts also have a strong substantive knowledge of business management BI and Data Analytics The comparisons between BI and business-related data analytics are easy to see They are both rooted in fundamental statistical analyses The dissimilarities, however, are not quite so apparent The functions of both is to transform raw data into useful information that can be used to make well-informed decisions BI and data analytics diverge in their methods and approaches BI employs predictive methods, such as forecasting This is done are by drawing basic inferences from past and current information Therefore, BI draws from the past and present to make predictions about future events Relevant data from historical and current trends can be extremely useful for helping to guide organizational planning and operations It is also instrumental for helping to guide day-to-day decisions Data analytics, on the other hand, seeks to make discoveries through the use of advanced mathematical or statistical methods It analyzes and makes predictions based on massive amounts of unprocessed data Such forward-thinking discernment is critical to the long-term success of an organization Data analysts try to discover new models of thinking and original ways of understanding data, in order to provide a fresh perspective on the organization, the way it functions, and its relationships with stakeholders Data analyst, unlike statisticians and traditional business analysts, must have an authentic understanding of the business itself and its context Data analytics requires organizational knowledge, in order to understand how new information is relevant to the current culture of the organization and its goals A few other things that distinguish data analysts from traditional BI include: • Sources of Data: BI relies on structured data housed in relational databases Data analysts utilize both structured and unstructured data, for example, the information spawned by machines or in social media interactions • Products: Traditional BI products include reports, data tables, and decision dashboards Data analysts, on the other hand, produce outputs that may be related to dashboards analytics and advanced data visualization, but typically not data reports Data analysts typically relay their findings through words and data visualizations, but not tables and reports This is due to the fact that the sources of data, with which they work, tend to be more complex than a typical organizational leader would be able to truly grasp • Technology: BI relies on relational databases, data warehouses, OLAP, and ETL technologies Data analytics utilizes data-engineered systems that use Hadoop, MapReduce, or Massively Parallel Processing • Expertise: BI relies heavily on IT and business technology expertise, whereas data analysts rely on expertise in analytics, statistical methods, computer programming, and business Because most business leaders are not trained to perform advanced data analytics themselves, it is beneficial for them to distinguish the types of decisions that are best-suited for a business leaders, and those best left to their data analysts In our rapidly-evolving knowledge-based economy, organizations seeking to remain competitive must constantly become more efficient in their operations and more strategic with resources The key to this is capitalizing on the opportunities provided by skilled analyses of industrial-level Big Data Chapter 6: Final Thoughts on Data Prior to the recent rise in analytics, businesses and organizations did not have the capacity to analyze a great deal of data, so a relatively small amount was maintained In today’s data-driven world, anything and everything may have significance, so there has been an attempt to record and keep virtually any data that we have the capacity to collect; and we have a great deal of capacity Beyond the quantity of data that we are gathering and storing is the quality of the data That is to say, data has grown beyond basic facts and figures to encompass media files Video, audio, and presentations have all become units of data for possible analysis A major concern with regards to data analytics is how to store and maintain all of these rapidly-increasing piles of data The data science community has begun to rely more heavily upon the software engineering community, in order to find solutions to our over-abundance of data Not all data is necessarily valuable Society now has advanced data analytics that allows us to glean useful and important information from even the smallest bits of data Such information, when reconciled with other groups of information, can (and has often) resulted in breakthrough of modern science, business, and economy As we consider our need to increase the role of data analytics in the ways that we function as organizations, we should keep in mind that data does not contain all of the answers to our growth and advancement Data provides us with the building material with which we can create new understanding and innovation The other part of the process is distinctively human This part includes creativity, risk taking, and cooperation It appears as though the less we have of one, the more we need of the other The more intellectual rigor and collaboration between various fields of science the more that we seem to benefit for even limited amounts of data Conversely, the less of those things that we have, the more data we need in order to learn, grow, and innovate Perhaps, the solution to our looming problem with big data is to reduce our need for so much of it Conclusion As we have seen, data analytics is inclusive and encompassing field of study What distinguishes data analytics from traditional areas of data analysis is its orientation toward the business world and its focus on Big Data Data analytics exists at the intersection of data science and computer technology Each of these sciences are constantly evolving, and each heavily influences the other Although a career in data analytics does not require specialized training in computer programing, familiarizing oneself with the fundamentals of computer science will definitely benefit a data analyst This introductory book has provided you with the necessary understanding and skills to move on to advanced principles, techniques, and procedures in data analytics Advanced data analytics build upon the fundamentals that are covered in this book Even the most sophisticated studies begin with basic research design principals that we discussed, measures of central tendency, descriptive analytics, basic charts and graphs, and analysis of the variance The differences lie in additional procedures that are conducted in order to further evaluate the quality of data and reliability of the results The majority of data analytics is accomplished utilizing the fundamental principles that you have just learned [...]... Boxplots for numerical data The primary purpose for data displays is to organize and present data clearly and effectively The reader will learn the most common types of data displays used to present both categorical and numerical data Also discussed are caveats concerning data interpretation, and guidelines for data evaluation Pie Charts Pie chart take are used for categorical data They illustrate the... be considered in order to control for outside variables Explaining Data Once data has been collected, it is time to compile it in order to get a view of the entire data set Analysts describe data in two basic ways: with images, like graphs and charts, and with figures, called descriptive analytics Descriptive analytics are the most commonly-used methods for describing data to the general population When... options for line graphs Select the standard line graph option if you have a lot of data values For small data sets, select the “Line with Markers” option This will emphasize each data point along the line Click on the chart, and an editing menu will open Click on “Select Data, ” and a widow will open that allows you to select the data that you would like In the Chart Data Range field, highlight the data. .. of data are distributed Normally distributed data values strengthen both the inferences that can be drawn and the predictions that can be made from statistical procedures conducted on a set of data Chapter 4: Charts and Graphs This chapter presents visual ways to present day, including Pie Charts and Bar Graphs for categorical data, Time Charts for time series data, and Histograms and Boxplots for. .. general population When used effectively, a chart or graph can easily explain volumes of data in a single snapshot Descriptive analytics Data can be summarized by using descriptive analytics Descriptive analytics are numerical representations of data that highlight the most important features of a dataset With categorical data, wherein everything is sorted into groups (e.g., age, gender, ethnicity, currency,... chapter discusses the major kinds of data analyses necessary to conduct effective data analytics In the following chapter you will learn the basics of calculating and measuring common descriptive analytics for measuring central tendency and variation within a set of data, as well as the analytics necessary to evaluate the relative position of a specific value within that data set Chapter 3: Measures of... Median is the middle number of a data set For a set of data that is composed of an odd number of values, the value in the middle the median For a set of data composed of an even number of values, is the average of the two middle numbers is the median The median is commonly utilized to divide a collection of data into two separate halves In order to find the median of a set of data, write the numbers of the... Annual Employee Losses Adding another Set of Data To add a second data line, enter your data into the spreadsheet, the same as in the previous section Add a third column of data next to your other columns You should now have three columns that contain the same number of values Click on your chart, and select Data under the Data heading When the “Select Data Source” window opens, click the “Add” button... which is a critical concept of data analytics Standard Deviation Standard deviation is a single value that represents how widely spread the values in a data set are from the central value (mean) The more spread out a data distribution is, the greater its standard deviation This value provides a precise measure of how widely dispersed the values are in a dataset, allowing for more advance statistical... variance of the data set Standard deviation is derived by calculating the square root of the variance Therefore, standard deviation is a highly reliable analytical value that can be used to conduct sophisticated analytical procedures Standard deviation is also necessary to perform probability calculations, making it that much more important to data analytics Step 1 Calculate the variance of the data set This
- Xem thêm -

Xem thêm: Data analytics for beginners paul kinley, Data analytics for beginners paul kinley, Data analytics for beginners paul kinley

Gợi ý tài liệu liên quan cho bạn

Nạp tiền Tải lên
Đăng ký
Đăng nhập