Statistical data analysis using SAS intermediate statistical methods 2nd edition

Thông tin tài liệu

Mervyn G Marasinghe • Kenneth J Koehler Statistical Data Analysis Using SAS Intermediate Statistical Methods Second Edition 123 Mervyn G Marasinghe Department of Statistics Iowa State University Ames, IA, USA Kenneth J Koehler Department of Statistics Iowa State University Ames, IA, USA Additional material to this book can be downloaded from http://extras.springer.com ISSN 1431-875X ISSN 2197-4136 (electronic) Springer Texts in Statistics ISBN 978-3-319-69238-8 ISBN 978-3-319-69239-5 (eBook) https://doi.org/10.1007/978-3-319-69239-5 Library of Congress Control Number: 2017959325 The program code and output for this book was generated using SAS software, Version 9.4 of the SAS System for Windows Copyright © 2002–2017 SAS Institute Inc SAS and all other SAS Institute Inc product or service names are registered trademarks or trademarks of SAS Institute Inc., Cary, NC, USA 1st edition: © Springer Science+Business Media, LLC 2008 2nd edition: © Springer International Publishing AG, part of Springer Nature 2018 Preface One of the hazards of writing a book based on a software system is that the release of a newer version of the software on which the book is based may supersede the appearance of the book in print This happened to the authors with the publication of the earlier edition of this book However, with a large and well-developed software system like SAS, this is not really an issue, particularly for the beginning user Because of its complexity and the availability of a variety of analytical tools, the task of learning SAS and then mastering it for everyday use for data analysis has become a long-term project That is what we found with the earlier edition Although it was based on SAS Version 9.1, we find that the earlier version is still in use today particularly as a reference and also by international SAS users to whom a later version of SAS may not be available The new edition is based on the current version of SAS, Version 9.4, although it was released almost years ago As discussed in the preface of the first edition, the aim of this book is to teach how to use the SAS software system for statistical analysis of data While the book is intended to be used as a textbook in a second course in statistical methods taught primarily to advanced undergraduates in statistics and graduate students in many other disciplines that involve the use of statistics for data analysis, it would be a valuable source of information for researchers in the academic setting as well as professionals in the industry and business that use the SAS system in their work In particular, data analysis has become an important tool in the general area of data science now being offered as a separate area of study The style of presentation of material in the revised book is the same as before: introduction of a brief theoretical and/or methodological description of each topic under discussion including the statistical model used if applicable and presentation of a problem as an application, followed by a SAS analysis of the data provided and a discussion of the results The primary reason for planning this revision is the fact that SAS has made a large number of changes beginning with SAS Version 9.2, as well as the introduction of a new system of statistical graphics that essentially replaced the SAS/GRAPH system that existed prior to that version This necessitated modifications to most of the SAS programs used in the book as well as the rewriting of an entire chapter The second reason was the incorporation of the ODS system for managing the tabular and graphical output produced from SAS procedures Not only did this require the reproduction of all output presented in the older version of the textbook, it also required adding additional textual material explaining these changes and the new commands that were required to use the new facility This book is intended for use as the textbook in a second course in applied statistics that covers topics in multiple regression and analysis of variance at an intermediate level Generally, students enrolled in such courses are primarily graduate majors or advanced undergraduate students from a variety of disciplines These students typically have taken an introductory-level statistical methods course that requires the use of a software system such as SAS for performing statistical analysis Thus, students are expected to have an understanding of basic concepts of statistical inference such as estimation and hypothesis testing when they begin on a course based on this book While the same approach that was used in the first edition is continued, we have rewritten material in almost every chapter; added new examples; completely replaced a chapter; added a new chapter based on SAS procedures for the analysis of nonlinear and generalized linear models; updated all SAS output, including graphics, that appears in the previous version; added more exercise problems to several chapters; and included completely new material on SAS templates in the appendix These changes necessitated the book to be lengthened by about 200 pages We started with a more gentle introductory example but proceed quickly to present more advance material and techniques, especially concerning the SAS data step Important features such as data step programming, pointers, and line-hold specifiers are described in detail Chapter which originally contained descriptions of how to use the SAS/GRAPH package was completely rewritten to describe new Statistical Graphics (SG) procedures that are based on ODS Graphics The basic theory of statistical methods covered in the text is discussed briefly and then is extended beyond the elementary level Particular attention has been given to topics that are usually not included in introductory courses These include discussion of models involving random effects, covariance analysis, variable subset selection methods in regression methods, categorical data analysis, graphical tools for residual diagnostics, and the analysis of nonlinear and generalized linear models We provide just sufficient information to facilitate the use of these techniques without burgeoning theoretical details A thorough knowledge of advanced theoretical material such as the theory of the linear model or the theory of maximum likelihood estimation is neither assumed nor required to assimilate the material presented SAS programs and SAS program outputs are used extensively to supplement the description of the analysis methods Example data sets are taken from the areas of biological and physical sciences and engineering Exercises are included in each chapter Most exercises involve constructing SAS programs for the analysis of given observational or experimental data Complete text files of all SAS examples used in the book can be downloaded from the Springer website for this book Text versions of all data sets used in examples and exercises are also available from the website Statistical tables are not reprinted in the book The first author has taught a one-semester course based on material from this book for many years The coverage depends on the preparation and maturity level of students enrolled in a particular semester In a class mainly composed of graduate students from disciplines other than statistics, with adequate knowledge of statistical methods and the use of SAS, the instructor may select more advanced topics for coverage and skip most of the introductory material Otherwise, in a mixed class of undergraduate and graduate students with little experience using SAS, the coverage is usually weeks of introduction to SAS, weeks on regression and graphics, and weeks of ANOVA applications This amounts to approximately 60% of the material in the textbook The structure of sections in the chapters facilitates this kind of selective coverage The first author wishes to thank Professor Kenneth J Koehler, the former chair of the Department of Statistics at Iowa State University, for agreeing to be a coauthor of this book and also to write Chap He has taught several courses based on the material for that chapter, and some of the examples are taken from his consulting projects Mervyn G Marasinghe Associate Professor Emeritus Department of Statistics Iowa State University, Ames, IA 50011, USA Kenneth J Koehler Professor Department of Statistics Iowa State University, Ames, IA 50011, USA Contents Introduction to the SAS Language 1.1 Introduction 1.2 Basic Language: A Summary of Rules and Syntax 1.3 Creating SAS Data Sets 1.4 The INPUT Statement 1.5 SAS Data Step Programming Statements and Their Uses 1.6 Data Step Processing 1.7 More on INPUT Statement 1.7.1 Use of Pointer Controls 1.7.2 The trailing @ Line-Hold Specifier 1.7.3 The trailing @@ Line-Hold Specifier 1.7.4 Use of RETAIN Statement 1.7.5 The Use of Line Pointer Controls 1.8 Using SAS Procedures 1.9 Exercises 1 13 16 21 31 39 39 41 43 44 46 48 59 More on SAS Programming and Some Applications 2.1 More on the DATA and PROC Steps 2.1.1 Reading Data from Files 2.1.2 Combining SAS Data Sets 2.1.3 Saving and Retrieving Permanent SAS Data Sets 2.1.4 User-Defined Informats and Formats 2.1.5 Creating SAS Data Sets in Procedure Steps 2.2 SAS Procedures for Descriptive Statistics 2.2.1 The UNIVARIATE Procedure 2.2.2 The FREQ Procedure 2.3 Some Useful Base SAS Procedures 2.3.1 The TABULATE Procedure 2.3.2 The REPORT Procedure 2.4 Exercises 69 69 70 72 78 82 89 94 98 105 122 122 129 139 Introduction to SAS Graphics 3.1 Introduction 3.2 Template-Based Graphics (SAS/ODS Graphics) 3.3 SAS Statistical Graphics Procedures 3.3.1 The SGPLOT Procedure 3.3.2 The SGPANEL Procedure 3.3.3 The SGSCATTER Procedure 3.4 ODS Graphics from Other SAS Procedures 3.5 Exercises 147 147 151 155 156 173 182 186 193 Statistical Analysis of Regression Models 4.1 An Introduction to Simple Linear Regression 4.1.1 Simple Linear Regression Using PROC REG 4.1.2 Lack of Fit Test 4.1.3 Diagnostic Use of Case Statistics 4.1.4 Prediction of New y Values Using Regression 4.2 An Introduction to Multiple Regression Analysis 4.2.1 Multiple Regression Analysis Using PROC REG 4.2.2 Case Statistics and Residual Analysis 4.2.3 Residual Plots 4.2.4 Examining Relationships Among Regression Variables 4.3 Types of Sums of Squares Computed in PROC REG 4.3.1 Model Comparison Technique and Extra Sum of Squares 4.3.2 Types of Sums of Squares in SAS 4.4 Subset Selection Methods in Multiple Regression 4.4.1 Subset Selection Using PROC REG 4.4.2 Other Options Available in PROC REG for Model Selection 4.5 Model Selection Using PROC GLMSELECT: Validation and Cross-Validation 4.6 Exercises 199 199 201 207 208 217 221 225 231 236 243 248 248 250 254 261 Analysis of Variance Models 5.1 Introduction 5.1.1 Treatment Structure 5.1.2 Experimental Designs 5.1.3 Linear Models 5.2 One-Way Classification 5.2.1 Using PROC ANOVA to Analyze One-Way Classifications 5.2.2 Making Preplanned (or A Priori) Comparisons Using PROC GLM 5.2.3 Testing Orthogonal Polynomials Using Contrasts 5.3 One-Way Analysis of Covariance 5.3.1 Using PROC GLM to Perform One-Way Covariance Analysis 301 301 304 305 306 308 272 273 282 317 325 331 337 339 5.3.2 5.4 5.5 5.6 5.7 5.8 One-Way Covariance Analysis: Testing for Equal Slopes A Two-Way Factorial in a Completely Randomized Design 5.4.1 Analysis of a Two-Way Factorial Using PROC GLM 5.4.2 Residual Analysis and Transformations Two-Way Factorial: Analysis of Interaction Two-Way Factorial: Unequal Sample Sizes Two-Way Classification: Randomized Complete Block Design 5.7.1 Using PROC GLM to Analyze a RCBD 5.7.2 Using PROC GLM to Test for Nonadditivity Exercises Analysis of Variance: Random and Mixed Effects Models 6.1 Introduction 6.2 One-Way Random Effects Model 6.2.1 Using PROC GLM to Analyze One-Way Random Effects Models 6.2.2 Using PROC MIXED to Analyze One-Way Random Effects Models 6.3 Two-Way Crossed Random Effects Model 6.3.1 Using PROC GLM and PROC MIXED to Analyze Two-Way Crossed Random Effects Models 6.3.2 Randomized Complete Block Design: Blocking When Treatment Factors Are Random 6.4 Two-Way Nested Random Effects Model 6.4.1 Using PROC GLM to Analyze Two-Way Nested Random Effects Models 6.4.2 Using PROC MIXED to Analyze Two-Way Nested Random Effects Models 6.5 Two-Way Mixed Effects Model 6.5.1 Two-Way Mixed Effects Model: Randomized Complete Block Design 6.5.2 Two-Way Mixed Effects Model: Crossed Classification 6.5.3 Two-Way Mixed Effects Model: Nested Classification 6.6 Models with Random and Nested Effects for More Complex Experiments 6.6.1 Models for Nested Factorials 6.6.2 Models for Split-Plot Experiments 6.6.3 Analysis of Split-Plot Experiments Using PROC GLM 6.6.4 Analysis of Split-Plot Experiments Using PROC MIXED 6.7 Exercises 347 355 358 363 367 375 386 389 395 398 419 419 423 426 430 438 441 448 449 451 455 457 460 471 482 494 494 500 503 509 516 Beyond Regression and Analysis of Variance 7.1 Introduction 7.2 Nonlinear Models 7.2.1 Introduction 7.2.2 Growth Curve Models 7.2.3 Pharmacokinetic Application of a Nonlinear Model 7.2.4 A Model for Biochemical Reactions 7.3 Generalized Linear Models 7.3.1 Introduction 7.3.2 Logistic Regression 7.3.3 Poisson Regression 7.4 Generalized Linear Models with Overdispersion 7.4.1 Introduction 7.4.2 Binomial and Poisson Models with Overdispersion 7.4.3 Negative Binomial Models 7.5 Further Extensions of Generalized Linear Models 7.5.1 Introduction 7.5.2 Poisson Regression with Rates 7.5.3 Logistic Regression with Multiple Response Categories 7.6 Exercises Appendix A SAS Templates A.1 Introduction A.1.1 What Are Templates? A.1.2 Where Are the SAS Default Templates Located? A.1.3 More on Template Stores A.2 Templates and Their Composition A.2.1 Style Templates A.2.2 Style Elements and Attributes A.2.3 Tabular Templates A.2.4 Simple Table Template Modification A.2.5 Other Types of Templates A.3 Customizing Graphs by Editing Graphical Templates A.4 Creating Customized Graphs by Extracting Code from Standard Graphical Templates 529 529 529 529 531 537 543 549 549 552 569 574 574 576 582 587 587 588 598 612 621 621 621 624 627 628 630 631 633 635 637 638 641 Appendix B Tables 645 References 671 Index 675 Introduction to the SAS Language 1.1 Introduction The SAS system is a computer package program for performing statistical analysis of data The system incorporates data manipulation and input/output capabilities as well as an extensive collection of procedures for statistical analysis of data The SAS system achieves its versatility by providing users with the ability to write their own program statements to manipulate data as well as call up SAS routines called procedures for performing major statistical analysis on specified data sets The user-written program statements usually perform data modifications such as transforming values of existing variables, creating new variables using values of existing variables, or selecting subsets of observations The statements and the syntax available to perform these manipulations are quite extensive so that these comprise an entire programming language Once data sets have thus been prepared, they are used as input to statistical procedures that performs the desired analysis of the data SAS will perform any statistical analysis that the user correctly specifies using appropriate SAS procedure statements When SAS programs are run under the SAS windowing environment, the source code is entered in the SAS Program Editor window and submitted for execution A Log window which shows the details of execution of the SAS code and an Output window which shows the results are also parts of this system Traditionally, results of a SAS procedure were displayed in the output window in the listing format using monospace fonts with which users of SAS in its previous versions are more familiar SAS provides the user the ability to manage where (the destination) and in what format the output is produced and displayed, via the SAS Output Delivery System (ODS) For example, output from executing a SAS procedure may be directed to a pdf or an html formatted file, the content to be included in the output selected and 664 Appendix B Tables Table B.14 (continued) Case: lcavol: lweight: age: lbph: svi: lcp: gleason: pgg45: lpsa: Case No 40 41 42 43 44 45 46 47 48 47 50 51 52 53 54 55 56 57 58 57 60 61 62 63 64 65 66 67 68 67 70 71 72 73 74 75 76 Case no of patient Log cancer volume Log prostate weight Age of patient in years Log benign prostatic hyperplasia amount Seminal vesicle invasion Log capsular penetration Gleason score Percentage Gleason scores or Log prostate specific antigen lcavol lweight age lbph svi lcp gleason pgg45 lpsa 0.777507176 3.013081 56 0.73607336 −0.16251873 2.2772673 0.620576488 3.141775 60 −1.38627436 −1.38627436 80 2.2775726 1.442201773 3.68261 68 −1.38627436 −1.38627436 10 2.3075726 0.58221562 3.865777 62 1.71377773 −0.43078272 2.3272777 1.771556762 3.876707 61 −1.38627436 0.81073022 2.3747058 1.486137676 3.407476 66 1.74717785 −0.43078272 20 2.5217206 1.663726078 3.372827 61 0.61518564 −1.38627436 15 2.5533438 2.727852828 3.775445 77 1.87746505 2.65675671 100 2.5687881 1.16315081 4.035125 68 1.71377773 −0.43078272 40 2.5687881 1.745715531 3.478022 43 −1.38627436 −1.38627436 2.5715164 1.220827721 3.568123 70 1.37371558 −0.7785077 2.5715164 1.071723301 3.773603 68 −1.38627436 −1.38627436 50 2.6567567 1.660131027 4.234831 64 2.07317173 −1.38627436 2.677571 0.512823626 3.633631 64 1.4727041 0.04877016 70 2.6844403 2.12704052 4.121473 68 1.76644166 1.44671878 40 2.6712431 3.153570358 3.516013 57 −1.38627436 −1.38627436 2.7047113 1.266747603 4.280132 66 2.12226154 −1.38627436 15 2.7180005 0.77455764 2.865054 47 −1.38627436 0.50077527 2.7880727 0.463734016 3.764682 47 1.42310833 −1.38627436 2.7742277 0.542324271 4.178226 70 0.43825473 −1.38627436 20 2.8063861 1.061256502 3.851211 61 1.27472717 −1.38627436 40 2.8124102 0.457424847 4.524502 73 2.32630162 −1.38627436 2.8417782 1.777417706 3.717651 63 1.61738824 1.7075425 40 2.8535725 2.77570885 3.524887 72 −1.38627436 1.55814462 75 2.8535725 2.034705648 3.717011 66 2.00821403 2.1102132 60 2.8820035 2.073171727 3.623007 64 −1.38627436 −1.38627436 2.8820035 1.458615023 3.836221 61 1.32175584 −0.43078272 20 2.8875701 2.02287117 3.878466 68 1.78337122 1.32175584 70 2.7204678 2.178335072 4.050715 72 2.30757263 −0.43078272 10 2.7626724 −0.446287103 4.408547 67 −1.38627436 −1.38627436 2.7626724 1.173722468 4.780383 72 2.32630162 −0.7785077 2.7727753 1.864080131 3.573174 60 −1.38627436 1.32175584 60 3.0130807 1.160020717 3.341073 77 1.74717785 −1.38627436 25 3.0373537 1.214712744 3.825375 67 −1.38627436 0.22314355 20 3.0563567 1.838761071 3.236716 60 0.43825473 1.178655 70 3.0750055 2.777226163 3.847083 67 −1.38627436 1.7075425 20 3.2752562 3.141130476 3.263847 68 −0.05127327 2.42036813 50 3.3375474 (continued) Appendix B Tables 665 Table B.14 (continued) Case: lcavol: lweight: age: lbph: svi: lcp: gleason: pgg45: lpsa: Case No 77 78 77 80 81 82 83 84 85 86 87 88 87 70 71 72 73 74 75 76 77 Case no of patient Log cancer volume Log prostate weight Age of patient in years Log benign prostatic hyperplasia amount Seminal vesicle invasion Log capsular penetration Gleason score Percentage Gleason scores or Log prostate specific antigen lcavol lweight age lbph svi lcp gleason pgg45 lpsa 2.010874777 4.433787 72 2.12226154 0.50077527 60 3.3728271 2.537657215 4.354784 78 2.32630162 −1.38627436 10 3.4355788 2.648300177 3.582127 67 −1.38627436 2.58377755 70 3.4578727 2.777440177 3.823172 63 −1.38627436 0.37156356 50 3.5130367 1.467874348 3.070376 66 0.55761577 0.22314355 40 3.5160131 2.513656063 3.473518 57 0.43825473 2.32727771 60 3.5307626 2.613006652 3.888754 77 −0.52763274 0.55761577 30 3.5652784 2.677570774 3.838376 65 1.11514157 1.74717785 70 3.5707402 1.562346305 3.707707 60 1.67561561 0.81073022 30 3.5876767 3.302847257 3.51878 64 −1.38627436 2.32727771 60 3.6307855 2.024173067 3.731677 58 1.63877671 −1.38627436 3.6800707 1.731655545 3.367018 62 −1.38627436 0.30010457 30 3.7123518 2.807573831 4.718052 65 −1.38627436 2.46385324 60 3.7843437 1.562346305 3.67511 76 0.73607336 0.81073022 75 3.773603 3.246470772 4.101817 68 −1.38627436 −1.38627436 4.027806 2.532702848 3.677566 61 1.34807315 −1.38627436 15 4.1275508 2.830267834 3.876376 68 −1.38627436 1.32175584 60 4.3851468 3.821003607 3.876707 44 −1.38627436 2.1670537 40 4.6844434 2.707447357 3.376185 52 −1.38627436 2.46385324 10 5.1431245 2.882563575 3.77371 68 1.55814462 1.55814462 80 5.477507 3.471766453 3.774778 68 0.43825473 2.70416508 20 5.5827322 666 Appendix B Tables Table B.15 Air pollution data for selected US cities City: SO2: AvTemp : NumFirms: Population: WindSpeed: AvPrecip: PrecipDays: City No Sulfur dioxide content of air in micrograms per cubic meter Average annual temperature in degrees Fahrenheit Number of manufacturing enterprises employing ¿ 20 workers Population size in thousands from the 1770 census Average annual wind speed in miles per hour Average annual precipitation in inches Average number of days with precipitation per year City 10 11 12 13 14 15 16 17 18 17 20 21 22 23 24 25 26 27 28 27 30 31 32 33 34 35 36 37 38 37 40 41 SO2 AvTemp NumFirms Population WindSpeed AvPrecip PrecipDays 10 70.3 213 582 7.05 36 13 61 71 132 8.2 48.52 100 12 56.7 453 716 8.7 20.66 67 17 51.7 454 515 12.75 86 56 47.1 412 158 43.37 127 36 54 80 80 40.25 114 27 57.3 434 757 7.3 38.87 111 14 68.4 136 527 8.8 54.47 116 10 75.5 207 335 57.8 128 24 61.5 368 477 7.1 48.34 115 110 50.6 3344 3367 10.4 34.44 122 28 52.3 361 746 7.7 38.74 121 17 47 104 201 11.2 30.85 103 56.6 125 277 12.7 30.58 82 30 55.6 271 573 8.3 43.11 123 68.3 204 361 8.4 56.77 113 47 55 625 705 7.6 41.31 111 35 47.7 1064 1513 10.1 30.76 127 27 43.5 677 744 10.6 25.74 137 14 54.5 381 507 10 37 77 56 55.7 775 622 7.5 35.87 105 14 51.5 181 347 10.7 30.18 78 11 56.8 46 244 8.7 7.77 58 46 47.6 44 116 8.8 33.36 135 11 47.1 371 463 12.4 36.11 166 23 54 462 453 7.1 37.04 132 65 47.7 1007 751 10.7 34.77 155 26 51.5 266 540 8.6 37.01 134 67 54.6 1672 1750 7.6 37.73 115 61 50.4 347 520 7.4 36.22 147 74 50 343 177 10.6 42.75 125 10 61.6 337 624 7.2 47.1 105 18 57.4 275 448 7.7 46 117 66.2 641 844 10.7 35.74 78 10 68.7 721 1233 10.8 48.17 103 28 51 137 176 8.7 15.17 87 31 57.3 76 308 10.6 44.68 116 26 57.8 177 277 7.6 42.57 115 27 51.1 377 531 7.4 38.77 164 31 55.2 35 71 6.5 40.75 148 16 45.7 567 717 11.8 27.07 123 Appendix B Tables 667 Table B.16 Number of failures (y in column (6)) for 90 valves from a pressurized nuclear reactor with operating time in 100 h units (z in column (7)) (1) (2) (3) (4) (5) (6) (7) (1) (2) (3) (4) (5) (6) (7) (1) (2) (3) (4) (5) (6) (7) 1752 1752 1 876 2 0 876 876 0 438 1 1752 2628 1 438 2 438 2 876 876 0 1752 1314 0 438 1 876 0 1752 0 876 0 438 438 4 1 438 876 1 1 15,768 1 2 1752 1 0 876 2 876 3 3504 3 1 6570 3 0 1752 1 438 0 876 4818 23 2628 21 1752 1 1752 0 1752 11 13,578 13,578 438 1 876 0 438 0 438 876 3 2 438 3 0 438 3 1 3066 3 0 1752 3 3504 3 0 1314 3 13 876 3 3 1314 3 0 1314 3 0 2190 4 1752 4 4380 0 1752 3 438 4 2 3504 4 0 1752 4 1314 0 438 2 1314 2 0 876 438 0 2190 1 438 0 1314 0 876 1752 0 1752 5 1 438 5 1314 5 0 3504 1 438 0 876 0 4818 1 438 2 438 2 0 876 3 1752 3 0 876 2 2190 6132 5 0 876 1 2190 0 876 1314 4 0 438 4 438 5 0 438 The five explanatory variables are System, Operator type, Valve type, Head size, and Operation mode (in columns (1)–(5), respectively), taking values as described in the text 668 Appendix B Tables Table B.17 ODS graphics plots options for the REG procedure Plot description Adjusted R-square statistic for models examined doing variable selection AIC statistic for models examined doing variable selection BIC statistic for models examined doing variable selection Cooks D statistic versus observation number Cp statistic for models examined doing variable selection Panel of fit diagnostics Regression line, confidence limits, and prediction limits overlaid on scatter plot of data Dependent variable versus predicted values Partial regression plot Normal quantile plot of residuals Residuals versus predicted values Plot of residuals versus regressor R-square statistic for models examined doing variable selection Studentized residuals versus leverage Studentized residuals versus predicted values SBC statistic for models examined doing variable selection Panel of fit statistics for models examined doing variable selection PLOTS option ADJRSQ ODS graph name AdjrsqPlot AIC AICPlot BIC BICPlot COOKSD CooksDPlot CP CPPlot DIAGNOSTICS FIT DiagnosticsPanel FitPlot OBSERVEDBYPREDICTED ObservedByPredicted PARTIAL QQ PartialPlot QQPlot RESIDUALBYPREDICTED ResidualByPredicted RESIDUALS ResidualPlot RSQUARE RSquarePlot RSTUDENTBYLEVERAGE RStudentByLeverage RSTUDENTBYPREDICTED RStudentByPredicted SBC SBCPlot CRITERIA SelectionCriterionPanel Appendix B Tables 669 Table B.18 ODS tables produced by the REG procedure ODS table name ANOVA Description Model ANOVA table CollinDiag Collinearity Diagnostics table Corr Correlation matrix for analysis variables CorrB Correlation of estimates CovB Covariance of estimates CrossProducts Bordered model XX matrix FitStatistics Model fit statistics InvXPX Bordered XX inverse matrix OutputStatistics Output statistics table ParameterEstimates Model parameter estimates ResidualStatistics SelParmEst SelectionSummary Residual statistics and PRESS statistic Parameter estimates for selection methods Selection summary for FORWARD, BACKWARD, and STEPWISE methods SimpleStatistics Simple statistics for analysis variables SubsetSelSummary Selection summary for R-square, Adj-RSq, and Cp methods USSCP Uncorrected SSCP matrix for analysis variables Statement Option MODEL Default MODEL COLLIN PROC ALL, CORR MODEL CORRB MODEL COVB MODEL ALL, XPX MODEL MODEL Default I MODEL ALL, CLI, CLM, INFLUENCE, P, R Default if SELECTION= is not specified ALL, CLI, CLM, INFLUENCE, P, R MODEL MODEL MODEL MODEL SELECTION=BACKWARD | FORWARD | STEPWISE | MAXR | MINR SELECTION=BACKWARD | FORWARD | STEPWISE PROC ALL, SIMPLE MODEL SELECTION=RSQUARE | ADJRSQ | CP PROC ALL, USSCP References Agresti, A (2013) Categorical data analysis (3rd ed.) Hoboken, NJ: Wiley Akaike, H (1981) Likelihood of a model and information criteria Journal of Econometrics, 16, 3–14 Armitage, P., & Berry, G (1994) Statistical methods in medical research (3rd ed.) Malden, MA: Blackwell Bates, D.M., & Watts, D.G (1988) Nonlinear regression analysis and its applications New York, NY: Wiley Bliss, C I (1935) The calculation of the dosage-mortality curve Annals of Applied Biology, 22 (1), 134–167 Bliss, C I (1970) Statistics in biology (Vol 2) New York, NY: McGraw-Hill Bowerman, B L., & O’Connell, R T (2004) Business statistics in practice (4th ed.) Chicago, IL: McGraw-Hill/Irwin Box, G E P., Hunter, W G., & Hunter, J S (1978) Statistics for experimenters New York, NY: Wiley Breslow, N E (1984) Extra-Poisson variation in Log-linear models Applied Statistics, 33 (1), 38–44 Chambers, J M., Cleveland, W S., Kleiner, B., & Tukey, P A (1983) Graphical methods in data analysis Belmont, CA: Wadsworth Chen, W W., Neipel, M., & Sorger, P K (2010) Classic and contemporary approaches to modeling biochemical reactions Genes & Development, 24 (17), 1861–1875 Collett, D (2003) Modelling binary data London: Chapman & Hall Crowder, M J (1978) Beta-binomial Anova for proportions Applied Statistics, 27 (1), 34–37 Deak, N A., & Johnson, L A (2007) Effects of extraction temperature and preservation method on functionality of soy protein Journal of the American Oil Chemists’ Society, 84, 259–268 Devore, J L (1982) Probability and statistics for engineering and the sciences Monterey, CA: Brooks/Cole 672 References Draper, N R., & Smith, H (1981) Applied regression analysis (2nd ed.) New York, NY: Wiley Draper, N R., & Smith, H (1998) Applied regression analysis (3rd ed.) New York, NY: Wiley Dunn, O J., & Clark, V A (1987) Applied statistics: Analysis of variance and regression analysis (2nd ed.) New York, NY: Wiley Efron, B., Hastie, T J., Johnstone, I M., & Tibshirani, R (2004) Least angle regression (with discussion) Annals of Statistics, 32, 407–499 Endrenyi, L (1981) Kinetic data analysis New York: Springer Graubard, B I., & Korn, E L (1987) Choice of column scores for testing independence in ordered 2xk contingency tables (with discussion) Biometrics, 43, 471–476 Henderson, C R., Kempthorne, O., Searle, S R., & von Krosigk, C N (1959) Estimation of environmental and genetic trends from records subject to culling Biometrics, 15, 192–218 Kenward, M G., & Roger, J H (1997) Small sample inference for fixed effects from restricted maximum likelihood Biometrics, 53, 983–997 Kirk, R E (1982) Experimental design (2nd ed.) Monterey, CA: Brooks/Cole Koopmans, L H (1987) Introduction to contemporary statistical methods (2nd ed.) Boston, MA: Duxbury Kuehl, R O (2000) Design of experiments: Statistical principles of research design and analysis Pacific Grove, CA: Brooks/Cole Kutner, M H., Nachtsheim, C J., & Neter, J (2004) Applied linear regression models (4th ed.) Chicago, IL: McGraw-Hill/Irwin Kutner, M H., Nachtsheim, C J., Neter, J., & Li, W (2005) Applied linear statistical models (5th ed.) Chicago, IL: McGraw-Hill/Irwin Leskovac, V (2003) Comprehensive enzyme kinetics New York: Kluwer Academic/Plenum Lindsey, J K (2001) Nonlinear models in medical sciences New York, NY: Oxford University Press Littell, R C., Freund, R J., & Spector, P C (1991) SAS system for linear models (3rd ed.) Cary, NC: SAS Institute Inc Lloyd, C J (1999) Statistical analysis of categorical data New York, NY: Wiley Lund, R E (1975) Tables for an approximate test for outliers in linear models Technometrics, 17, 473–476 Madsen, H., & Thyregod, P (2011) Introduction to general and generalized linear models Boca Raton, FL: Chapman & Hall/CRC Margolin, B H., Kaplan, N., & Zeiger, E (1981) Statistical analysis of the Ames Salmonella test Proceedings of the National Academy of Sciences of the United States of America, 78 (6), 3779–3783 Mason, R L., Gunst, R F., & Hess, J L (1989) Statistical design & analysis of experiments New York, NY: Wiley References 673 McClave, J T., Benson, G P., & Sincich T L (2000) Statistics for business and economics (8th ed.) Englewood Cliffs, NJ: Prentice Hall Inc McCullagh, P., & Nelder, J (1989) Generalized linear models (2nd ed.) Boca Raton, FL: Chapman & Hall/CRC McDonald, G C., & Schwing, R C (1973) Instabilities of regression estimates relating air pollution to mortality Technometrics, 15, 463–482 Milliken, G A., & Johnson, D E (2001) Analysis of Messy data, volume III: Analysis of covariance Boca Raton, FL: Chapman & Hall/CRC Montgomery, D C (1991) The design and analysis of experiments (3rd ed.) New York, NY: Wiley Montgomery, D C (2013) The design and analysis of experiments (8th ed.) New York, NY: Wiley Moore, L M., & Beckman, R J (1988) Approximate one-sided tolerance bounds on the number of failures using Poisson regression Technometrics, 30, 283–290 Morel, J G., & Neerchal, N K (2012) Overdispersion models in SAS Cary, NC: SAS Institute Inc Morrison, D F (1983) Applied linear statistical methods Englewood Cliffs, NJ: Prentice Hall Inc Myers, R H (1990) Classical and modern regression with applications (2nd ed.) Boston, MA: PWS-KENT Publishing Nelder, J., & Wedderburn, R (1972) Generalized linear models Journal of the Royal Statistical Society, Series A, 135, 370–384 Okada, Y., Yabe, T., & Oda, S (2010) Temperature-dependent sex determination in Japanese pond turtles, Mauremys japonica (Reptilia: Geoemydidea) Current Herpetology, 29 (1), 1–10 Ostle, B (1963) Statistics in research (2nd ed.) Ames, IA: Iowa State University Press Ott, R L., Larson, R F., & Mendenhall, W (1987) Statistics: A tool for the social sciences (4th ed.) Boston, MA: Duxbury Ott, R L., & Longnecker, M (2001) An introduction to statistical methods and data analysis (5th ed.) Pacific Grove, CA: Duxbury Price, C J., Kimmel, C A., George, J D., & Marr, M.C (1987) The developmental toxicity of diethylene glycol dimethyl ether in mice Fundamental and Applied Toxicology, 8, 115–126 Rice, J A (1988) Mathematical statistics and data analysis Pacific Grove, CA: Wadsworth & Brooks/Cole Sahai, H., & Ageel M I (2000) The analysis of variance Boston, MA: Birkhă auser Schlotzhauer, S D., & Littell, R C (1997) SAS system for elementary statistical analysis (2nd ed.) Cary, NC: SAS Institute Inc Searle, S R (1971) Linear models New York, NY: Wiley Searle, S R., Casella, G., & McCulloch, C E (1992) Variance components New York, NY: Wiley Simonoff, J S (2003) Analyzing categorical data New York: Springer-Verlag 674 References Simpson, J., Olsen, A., & Eden, J (1975) A Bayesian analysis of a multiplicative treatment effect in weather modification Technometrics, 17, 161–166 Snedecor, G W., & Cochran, W G (1989) Statistical methods (8th ed.) Ames, IA: Iowa State University Press Sokal, R R., & Rohlf, J F (1995) Biometry: The principles and practice of statistics in biological research (3rd ed.) New York, NY: Freeman Stamey, T., Kabalin, J., McNeal, J., Johnstone, I., Freiha, F., Redwine, E., & Yang, N (1989) Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate II radical prostatectomy treated patients Journal of Urology, 16, 1076–1083 Tibshirani, R (1996) Regression shrinkage and selection via the lasso Journal of the Royal Statistical Society, Series B, 58, 267–288 Tukey, J W (1949) One degree of freedom for nonadditivity Biometrics, 5, 232–242 Ver Hoef, J., & Boveng, P (2007) Quasi-Poisson vs negative binomial regression: How should we model overdispersed count data? Ecology, 11, 2766– 2772 Wedderburn, R W M (1974) Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method Biometrika, 61, 439–447 Weisberg, S (1985) Applied linear regression analysis (2nd ed.) New York, NY: Wiley Index #n, 46 FREQ , 92 N , 32 TYPE , 92 a priori comparisons, 310, 312, 367 added-variable plot, 243 adjust =, 346 adjust = (proc glm), 462 adjust = (proc mixed), 466, 467 adjusted means, 339 adjusted R2 , 273 aic, 267, 271, 275, 279 AIC criterion, 260, 551 aicc, 275 all-subsets method, 258 alpha =, 99, 320 Anderson–Darling test, 99 array, 28, 29, 35 assignment statements, 21 asycov (proc mixed), 434 asymmetric lambda, 115, 120 at means, 344 at option, 344 attributes, 9, 52 backward elimination, 264 bartlett, 323 best =, 267, 271 best estimates, 307 best linear unbiased predictor, 430 bias, 259 bic, 275 BIC criterion, 260 block effects, 392 blocking, 387 BLUP, 430, 433, 435, 438, 459, 460, 466, 476, 478, 479, 481, 488 Bonferroni adjustment, 493 Bonferroni method, 211, 212, 233, 316, 492 bootstrap, 534 bootstrapped confidence interval, 536, 542, 547 box plot, 99, 122 by statement, 50 case statistics, 231 catalog entry, 88 cell frequency, 108, 110, 119 cell means, 355, 358, 359 cellchi2, 110 chi-square statistic, 106, 107, 110 chi-square test, 106, 119 chisq, 110 cibasic, 99, 103 cl (proc mixed), 434, 488 class, 89, 100 class level information, 392 clb, 202, 219, 245 cldiff, 359 cli, 218 clm, 218 clparm=, 554 676 Index coefficient of determination, 201 collin, 244 collinearity diagnostics, 244 column input, 19 comparison operators, 24 concatenation, 73, 74 conditional execution, 24 constrained parameters (CP) model, 471 contrast, 346 contrast (proc glm), 325, 329, 371, 372, 378, 382, 383, 395 Cook’s D, 209, 214, 234 covariance analysis, 337 covariate, 339, 341 cp, 267, 268, 271, 275, 276, 279 Cp statistic, 259, 273 Cramer’s V, 113, 118 Cramer’s V, 112 data =, 81, 99 data set, 5, 6, data step, data step programming, 21 delimiter, 70 delimiter =, 72 design matrix, 301, 343, 422, 457, 459 details = all (proc reg), 262 deviance, 431, 551, 573 device =, 177 diagnostic statistics, 231 divisor =, 331, 335, 385 dlm =, 72 loop, 26, 36, 37, 171 drop, 11, 35, 37, 97 dsd, 72 effects model, 308, 359, 371, 376 error contrasts, 423 estimable functions, 304, 378, 383 estimate (proc glm), 325, 334, 373, 378 exact, 107, 111 exact test, 119 expected, 110 expected mean squares, 424, 427 experimentwise error, 316, 322 experimentwise error rate, 494 externally studentized residuals, 233 F-to-delete, 256 F-to-enter, 255, 264 F-to-remove, 256 filename, 71, 79, 97 fileref, 70, 79 firstobs =, 72 fisher, 110 Fisher’s exact test, 107, 111, 119 fitted values, 231 format, 8, 52, 82, 87, 88, 97 format(proc step), 55 formatted input, 6, 17 formatted-value, 83 formchar =, 123 forward selection, 255 fuzz =, 85 Gamma, 114, 120, 121 Gauss–Newton optimization, 534 goodness-of-fit test, 105 goptions, 177 groupnames = (proc reg), 273 Hat Diag, 209, 211 hat matrix, 232 hierarchical, 449, 482 homogeneity, 106 homogeneity of variance, 323 hovtest =, 323 if-then, 24 if-then/else, 26 infile, 70, 97 influence, 209, 211, 234, 235 informat, 8, 17, 19, 82, 86, 88 informatted-value, 83 input, 6, 10, 12, 13, 16, 17, 19–21, 26, 32–36, 39–41, 43, 53 input buffer, 31, 34, 40 interaction, 222, 363, 367 interaction comparisons, 368 interaction plot, 170, 360 interaction test, 357 intraclass correlation, 426 invalue, 86 iterative procedure, 421 Kendall’s tau-b, 114, 120, 121 Kenward–Roger, 479 Index keylabel, 123 Kolmogorov–Smirnov test, 99 kurtosis, 90 label, 52 label option, 55 labeling statements, 26 lack of fit, 208 least squares, 199 least squares method, 222, 307 length, 52, 86 levene, 323 leverage, 209, 234 libname, 79–81, 97, 99, 117, 174 libref, 79, 81, 82 likelihood, 421 line pointer control, 46 linear trend, 331, 333, 508 link function, 549 link=, 591 link= , 572, 600 list input, 16 loess, 573 log page, log-likelihood, 421 logical operators, 24 logit, 552 LSD, 314, 322, 326, 393 lsmeans (proc glm), 341, 344, 378, 379 main effects, 357 Mallows’ Cp, 259 Mantel-Haenszel, 107 maxdec =(proc means), 91 maximum likelihood, 307 maximum likelihood estimates, 421, 550, 554 maximum likelihood method, 431 maxr, 272 means (proc glm), 328 means model, 308, 371 measures, 110 merge, 72, 78 method = type3 (proc mixed), 434 method of moments, 420, 425 method of moments estimates, 429, 434 Michaelis–Menten equation, 543 minr, 272 missing values, 12 677 missover, 72 MIVQUE(0), 431 mixed model equations, 459 ML estimates, 423 MLE, 421 model(proc anova), 318 modifier :, 43, 45 mu0 =, 103 multicollinearity, 244, 246 multinomial probabilities, 106 multiple comparisons, 316 multiple correlation coefficient, 258 multiway tables, 108 n =(infile statement), 46 nested loops, 37 nested factor, 482, 494 Newton–Raphson optimization, 550 nonadditive, 358, 396 nonadditivity, 396 noprint, 110 normal, 99 normal equations, 223, 303 normal probability plot, 94, 95, 97, 99, 122, 206, 240 obs =, 72 odds, 552 odds ratio, 553 ODS, 246 offset, 589 one-way classification, 308 options in reg, 211 order =, 100, 392 orthogonal polynomials, 331, 508 output, 77, 89, 94 output (proc glm), 364 Output Delivery System, 246 output(data step), 33, 35, 36 pairwise comparisons, 314 param=, 563 parameter estimates, 304 partial, 245 partial regression residual plot, 243 partial slope, 243 partial sums of squares, 251, 263 partitioning SS, 367 pctldef =, 99 678 Index pctlpre =, 105 pctlpts =, 105 pdiff (proc glm), 346 PDV, 31, 34, 40, 42 Pearson residuals, 578, 585 Pearson’s correlation, 106, 115, 120, 121 per-comparison error rate, 314 permanent data set, plots, 99 plots(only)=effect, 555 pointer, 19, 39 pointer control, 19, 26, 38, 46 power transformation, 365 precedence rules, 22 predictable functions, 459, 498 predicted values, 217, 231 prediction, 223 prediction interval, 217 preplanned comparisons, 310, 312, 367, 389 proc anova, 318, 319 proc corr, 89, 122 proc format, 83, 88 proc freq, 105, 107, 110, 113, 118 proc means, 90, 91 proc report, 130 proc sgplot, 174 proc sort, 50, 53 proc statement options, 49 proc step, 48 proc tabulate, 122, 123 proc univariate, 98, 99, 103 procedure information statements, 49 profile log-likelihood, 422 profile plot, 170, 359 program data vector, 31, 34 Q option (proc glm), 463 quadratic form, 420, 463–465 quadratic trend, 331 R2 , 224 RCBD, 386 reduction notation, 249 ref= , 564 reference line, 335 REML estimates, 423 REML method, 431 reps, 387 residual plots, 236, 336 retain, 44 rsquare (proc reg), 267 RStudent, 211, 212, 215 Satterthwaite, 425, 433, 448, 469, 474, 478, 487, 491, 510, 511, 513 sbc, 275, 277, 279 scale= , 577, 580 scatter plot matrix, 182 Scheffé procedure, 316 Scheffé’s method, 316 selection =, 261 sequential sums of squares, 250 set, 72, 74, 76, 77 Shapiro–Wilk test, 97, 99 side-by-side box plots, 167 skewness, 90 sle =, 266 sls =, 266 Somers’ D, 114 Spearman’s correlation, 115, 120, 121 start =, 267, 271 statistic keyword, 90 stepwise, 256, 266 stnamel, 86 stop =, 267, 271 Stuart’s tau c, 115, 121 Studentized range, 316, 469, 470 studentized residuals, 233 subscripts, 28 subset selection, 254 subsetting, 14 subsetting if, 72 table, 123–125 tables, 107, 108, 110 temporary data set, test, 107 trailing at symbol, 171, 325, 391 transformation, 364 trim =, 100, 103 trimmed mean, 100, 103 Tukey procedure, 316, 322 Tukey’s method, 316 Tukey’s test, 396 two-level data set names, 79–81, 174 two-way factorial, 355 type =, 89 Index Type Type Type Type I, 250, 377 II, 250, 263, 377 III, 326, 377 III E (MS), 427, 436, 443–445, 453, 455, 464, 468, 477, 486 Type IV, 377 types, 93 Types of sums of squares, 250 unadjusted means, 378 unconstrained parameters (UP) model, 471 unequal sample sizes, 375 unqual sample sizes, 436, 478, 479 unweighted means, 379 var, 89 vardef =, 100 variable attribute statements, 52 variance components, 422, 427, 431 variance inflation factor, 244 variance–covariance matrix, 422 vif, 244, 245, 247 Wald confidence interval, 559, 570 Wald statistic, 448, 468 ways, 93 where, 15 Wilcoxon signed rank test, 99 X’X matrix, 303 x-outlier, 209, 234 y-outlier, 209, 233 679 ... Creating SAS Data Sets 13 These include data, datalines, array, label, length, format, informat, by, and where statements 1.3 Creating SAS Data Sets Creating a SAS data set suitable for subsequent analysis. .. stored then in SAS data sets as character or numeric values as appropriate SAS Data Sets SAS data sets consist of data values arranged in a rectangular array as displayed in Fig 1.7 Data values... row of data values in a SAS data set may represent an observation However, it is possible for each observation in a SAS data set to be formed using data values obtained from several input data

Ngày đăng: 04/03/2019, 08:47

Xem thêm: Statistical data analysis using SAS intermediate statistical methods 2nd edition , 2 Basic Language: A Summary of Rules and Syntax, 7 Two-Way Classification: Randomized Complete Block Design

Statistical data analysis using SAS intermediate statistical methods 2nd edition

Thông tin tài liệu

Từ khóa liên quan

Mục lục

Preface

Contents

1 Intro to SAS Language

1.1 Introduction

1.2 Basic Language: A Summary of Rules and Syntax

1.3 Creating SAS Data Sets

1.4 The INPUT Statement

1.5 SAS Data Step Programming Statements and Their Uses

Example 1.5.1

SAS Functions

Example 1.5.2

Example 1.5.3

Example 1.5.4

Example 1.5.5

Example 1.5.6

Example 1.5.7

Example 1.5.8

Example 1.5.9

Example 1.5.10

1.6 Data Step Processing

1.7 More on INPUT Statement

1.7.1 Use of Pointer Controls

1.7.2 The trailing @ Line-Hold Specifier

Tài liệu cùng người dùng

Tài liệu liên quan