Tài liệu Statistical Analysis with R Beginner''''s Guide doc

450 1.4K 1
Tài liệu Statistical Analysis with R Beginner''''s Guide doc

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

www.it-ebooks.info Statistical Analysis with R Beginner's Guide Take control of your data and produce superior statistical analyses with R John M Quick BIRMINGHAM - MUMBAI www.it-ebooks.info Statistical Analysis with R Beginner's Guide Copyright © 2010 Packt Publishing All rights reserved No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews Every effort has been made in the preparation of this book to ensure the accuracy of the information presented However, the information contained in this book is sold without warranty, either express or implied Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals However, Packt Publishing cannot guarantee the accuracy of this information First published: October 2010 Production Reference: 1191010 Published by Packt Publishing Ltd 32 Lincoln Road Olton Birmingham, B27 6PA, UK ISBN 978-1-849512-08-4 www.packtpub.com Cover Image by John M Quick (john@johnmquick.com) www.it-ebooks.info Credits Author John M Quick Reviewers Ajay Ohri Editorial Team Leader Akshara Aware Project Team Leader Priya Mukherji Joshua Wiley Project Coordinator Acquisition Editor Douglas Paterson Development Editor Meeta Rajani Technical Editor Vanjeet D'souza Jovita Pinto Proofreaders Aaron Nash Chris Smith Graphics Nilesh Mohite Production Coordinator Indexer Aparna Bhagat Tejal Daruwale Cover Work Aparna Bhagat www.it-ebooks.info About the Author John M Quick is an Educational Technology Ph.D student at Arizona State University who is interested in the design, research, and use of educational innovations Currently, his work focuses on mixed-reality systems, interactive media, and innovation adoption In addition, he has recently published multiple gaming applications for the iPhone and iPad John's blog, High-Technically Correct, which covers various topics in technology, is available online at http://www.johnmquick.com I give thanks to the R Project and its user community for offering the world superior open-source statistical software I also thank Dr Roy Levy for introducing me to, and encouraging me to share my knowledge of, R Lastly, I would like to thank my parents for their lifelong support and Zarraz for the companionship and insights that she offered to me throughout the authoring of this book www.it-ebooks.info About the Reviewers Ajay Ohri has been working in the field of analytics since 2004 , when it was a still nascent emerging Industry in India He has worked with the top two Indian outsourcers listed on NYSE, and with Citigroup on cross-sell analytics where he helped sell an extra 50000 credit cards by cross-sell analytics He was one of the very first independent data mining consultants in India working on analytics products and domestic Indian market analytics He regularly writes on analytics topics on his website www.decisionstats.com and is currently working on open source analytical tools like R and analytical software like SAS Joshua Wiley has implemented R in several laboratories on multiple campuses of the University of California system to run statistical analyses and produce high-quality graphics He also uses it for data processing in descriptive and inferential statistics He is currently working towards his Ph.D at UCLA, where he researches Health Psychology In addition to his own work with R, Mr Wiley has led tutorials for other psychology researchers on using R, and is an active member of the R-help mailing list www.it-ebooks.info www.it-ebooks.info Table of Contents Preface Chapter 1: Uncovering the Strategist's Data Analysis Tool What is R? What are the benefits of using R? Why should I use R? Why should I read this book? What topics are covered in this book? Chapter 2—Preparing R for Battle Chapter 3—Exploring the Mysterious Data Analysis Tool Chapter 4—Collecting and Organizing Information Chapter 5—Assessing the Situation Chapter 6—Planning the Attack Chapter 7—Organizing the Battle Plans Chapter 8—Briefing the Emperor Chapter 9—Briefing the Generals Chapter 10—Becoming a Master Strategist Summary Chapter 2: Preparing R for Battle Time for action – downloading and installing R Example: R 2.11.1 Mac OS X 10.5+ installation wizard demonstration Time for action – issuing your first R command Time for action – setting your R working directory Summary Chapter 3: Exploring the Mysterious Data Analysis Tool Deciphering Zhuge Liang's magic square Time for action – solving the first 4x4 magic square Lines Comments www.it-ebooks.info 8 9 10 11 11 12 12 13 14 15 17 17 19 20 24 29 30 32 33 34 35 37 37 Table of Contents Calculations Output Visualizing the R console Summary 38 38 39 41 Chapter 4: Collecting and Organizing Information Time for action – importing external data read.csv(file) comma-separated values (csv) files Time for action – creating and calling variables Time for action – accessing data within variables variable$column notation attach(variable) function variable[row, column] notation Time for action – manipulating variable data Performing a calculation on an entire dataset Performing a calculation on a row, column, or cell Using variable data in function arguments Saving a variable calculation into a new variable Time for action – managing the R workspace Listing the contents of the R workspace Saving the contents of the R workspace Loading the contents of the R workspace Quitting R Distinguishing between the R console and workspace Saving the R console Summary Chapter 5: Assessing the Situation 43 43 44 44 45 47 49 49 50 51 53 54 54 55 57 58 59 59 59 59 60 62 63 Time for action – making an initial inference from our data Examining our data Time for action – creating a subset from a large dataset Multi-argument functions Variable-argument functions Equivalency operators subset(data, ) Time for action – deriving summary statistics Means Standard deviations Ranges summary(object) Why use summary statistics? [ ii ] www.it-ebooks.info 63 65 66 67 67 67 67 69 71 71 72 72 72 Table of Contents Time for action – quantifying categorical variables as.numeric(data) Overwriting variables Time for action – correlating variables Interpreting correlations cor(x, y) cor(data) NA values Regression Time for action – modelling with simple linear regression lm(formula, data) Linear model output Linear model summary Interpreting a linear regression model Time for action – modelling with multiple linear regression Interpreting the summary output Explaining model differences Time for action – modelling interactions Interpreting interaction variables Time for action – comparing and choosing models Interpreting the model summaries Interpreting the ANOVA results 73 75 75 77 78 79 80 80 82 82 84 84 85 86 88 90 91 92 94 96 98 99 anova(object, ) Summary 100 101 Chapter 6: Planning the Attack 103 Review of models Head to head Surround Ambush Fire Predicting outcomes using regression models Rating Successfully executed Number of Wei soldiers Duration of battle A word about assumptions Time for action – calculating outcomes from regression models Time for action – creating custom functions function() Extended lines [ iii ] www.it-ebooks.info 103 104 105 106 107 108 108 108 109 110 110 110 111 113 114 www.it-ebooks.info www.it-ebooks.info www.it-ebooks.info www.it-ebooks.info www.it-ebooks.info www.it-ebooks.info www.it-ebooks.info www.it-ebooks.info www.it-ebooks.info www.it-ebooks.info www.it-ebooks.info www.it-ebooks.info www.it-ebooks.info www.it-ebooks.info www.it-ebooks.info ... www.it-ebooks.info Preparing R for Battle Before you can begin to formulate a strategy for the Shu forces, you must ensure that your data analysis tool is in working order Fortunately, R can be prepared for battle... Time for action – quantifying categorical variables as.numeric(data) Overwriting variables Time for action – correlating variables Interpreting correlations cor(x, y) cor(data) NA values Regression... working directory Furthermore, all file path arguments in functions are evaluated relative to the working directory Therefore, it is important to set your working directory each time you use R [ 30

Ngày đăng: 21/02/2014, 10:20

Từ khóa liên quan

Mục lục

  • Cover

  • Copyright

  • Credits

  • About the Author

  • About the Reviewers

  • Table of Contents

  • Preface

  • Chapter 1: Uncovering the Strategist's Data Analysis Tool

    • What is R?

    • What are the benefits of using R?

    • Why should I use R?

    • Why should I read this book?

    • What topics are covered in this book?

      • Chapter 2—Preparing R for Battle

      • Chapter 3—Exploring the Mysterious Data Analysis Tool

      • Chapter 4—Collecting and Organizing Information

      • Chapter 5—Assessing the Situation

      • Chapter 6—Planning the Attack

      • Chapter 7—Organizing the Battle Plans

      • Chapter ࠠ᐀䈀爀椀攀昀椀渀最 琀栀攀 䔀洀瀀攀爀漀

      • Chapter ठ᐀䈀爀椀攀昀椀渀最 琀栀攀 䜀攀渀攀爀愀氀

      • Chapter 10—Becoming a Master Strategist

    • Summary

  • Chapter 2: Preparing R for Battle

    • Time for action – downloading and installing R

      • Example: R 2.11.1 Mac OS X 10.5+ installation wizard demonstration

    • Time for action – issuing your first R command

    • Time for action – setting your R working directory

    • Summary

  • Chapter 3: Exploring the Mysterious Data Analysis Tool

    • Deciphering Zhuge Liang's magic square

    • Time for action – solving the first 4x4 magic square

      • Lines

      • Comments

      • Calculations

      • Output

      • Visualizing the R console

    • Summary

  • Chapter 4: Collecting and Organizing Information

    • Time for action – importing external data

      • read.csv(file)

      • comma-separated values (csv) files

    • Time for action – creating and calling variables

    • Time for action – accessing data within variables

      • variable$column notation

      • attach(variable) function

      • variable[row, column] notation

    • Time for action – manipulating variable data

      • Performing a calculation on an entire dataset

      • Performing a calculation on a row, column, or cell

      • Using variable data in function arguments

      • Saving a variable calculation into a new variable

    • Time for action – managing the R workspace

      • Listing the contents of the R workspace

      • Saving the contents of the R workspace

      • Loading the contents of the R workspace

      • Quitting R

      • Distinguishing between the R console and workspace

      • Saving the R console

    • Summary

  • Chapter 5: Assessing the Situation

    • Time for action – making an initial inference from our data

    • Examining our data

    • Time for action – creating a subset from a large dataset

      • Multi-argument functions

      • Variable-argument functions

      • Equivalency operators

      • subset(data, ...)

    • Time for action – deriving summary statistics

      • Means

      • Standard deviations

      • Ranges

      • summary(object)

      • Why use summary statistics?

    • Time for action – quantifying categorical variables

      • as.numeric(data)

      • Overwriting variables

    • Time for action – correlating variables

      • Interpreting correlations

      • cor(x, y)

      • cor(data)

      • NA values

    • Regression

    • Time for action – modelling with simple linear regression

      • lm(formula, data)

      • Linear model output

      • Linear model summary

      • Interpreting a linear regression model

    • Time for action – modelling with multiple linear regression

      • Interpreting the summary output

      • Explaining model differences

    • Time for action – modelling interactions

      • Interpreting interaction variables

    • Time for action – comparing and choosing models

      • Interpreting the model summaries

        • Interpreting the ANOVA results

      • anova(object, ...)

    • Summary

  • Chapter 6: Planning the Attack

    • Review of models

      • Head to head

      • Surround

      • Ambush

      • Fire

    • Predicting outcomes using regression models

      • Rating

      • Successfully executed

      • Number of Wei soldiers

      • Duration of battle

      • A word about assumptions

    • Time for action – calculating outcomes from regression models

    • Time for action – creating custom functions

      • function()

      • Extended lines

    • Time for action – creating resource-focused custom functions

    • Logistical considerations

      • Gold

      • Provisions

      • Equipment

      • Soldiers

      • Resource and cost summary

      • Resource map

    • Time for action – incorporating resource constraints

    • into predictions

      • Gold cost function explanation

    • Assessing viability

    • Time for action – assessing the viability of potential strategies

      • Remember your assumptions

    • Summary

  • Chapter 7: Organizing the Battle Plans

    • Retracing and refining a complete analysis

    • Time for action – first steps

    • Time for action – data setup

      • read.table(...)

    • Time for action – data exploration

    • Time for action – model development

      • glm(...)

      • AIC(object, ...)

    • Time for action – model deployment

      • coef(object)

    • Time for action – last steps

    • The common steps to all R analyses

      • Step 1: Set your working directory

        • Comment your work

      • Step 2: Import your data (or load an existing workspace)

      • Step 3: Explore your data

      • Step 4: Conduct your analysis

      • Step 5: Save your workspace and console files

    • Summary

  • Chapter 8: Briefing the Emperor

    • Charts, graphs, and plots in R

    • Time for action – creating a bar chart

      • barplot(...)

      • Vectors

      • Graphic window

    • Time for action – customizing graphics

      • Graphic customization arguments

        • main, xlab, and ylab

        • xlim and ylim

        • Col

      • legend(...)

    • Time for action – creating a scatterplot

      • Single scatterplot

      • Multiple scatterplots

    • Time for action – creating a line chart

      • type

      • Number-colon-number notation

    • Time for action – creating a box plot

      • boxplot(...)

    • Time for action – creating a histogram

      • hist(...)

    • Time for action – creating a pie chart

      • pie(...)

    • Time for action – exporting graphics

    • Summary

  • Chapter 9: Briefing the Generals

    • More charts, graphs, and plots in R

    • Time for action – customizing a bar chart

      • names

      • width and space

      • horiz

      • beside

      • density and angle

      • legend(...) with density, angle, and cex

    • Time for action – customizing a scatterplot

      • pch and cex

      • points(...)

      • legend(...)

      • abline(...)

    • Time for action – customizing a line chart

      • lwd

      • lines(...)

      • legend(...)

    • Time for action – customizing a box plot

      • range

      • axis(...)

    • Time for action – customizing a histogram

      • breaks

      • freq

    • Time for action – customizing a pie chart

      • Custom labels

      • legend(...)

    • Time for action – building a graphic

    • Time for action – building a graphic with multiple visuals

      • par(mfcol)

      • Graphics

        • Horizontal and vertical lines

        • Nested functions

    • Summary

  • Chapter 10: Becoming a Master Strategist

    • R's built-in resources

    • Time for action – using R's help function

      • help(...)

    • Time for action – expanding R with packages

      • Choose a CRAN mirror

      • Install a package

      • Load the package

      • Use the package

    • R's online resources

      • Websites

        • The R Project for Statistical Computing

        • Quick-R

        • R Programming wikibook

        • R Graph Gallery

        • Crantastic!

      • Blogs

        • R bloggers

        • R Tutorial Series

      • Online communities

        • R-help mailing list

        • Other mailing lists

      • Search engines

        • R Seek

        • Google

    • Summary

  • Appendix: Pop Quiz Answer Key

    • Chapter 2

    • Chapter 3

    • Chapter 4

    • Chapter 5

    • Chapter 6

    • Chapter 7

    • Chapter 8

    • Chapter 9

    • Chapter 10

  • Index

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan