Elementary statistics technology update 11th edition part 3

11-1 Review and Preview 11-2 Goodness-of-Fit 11-3 Contingency Tables 11-4 McNemar’s Test for Matched Pairs Goodness-of-Fit and Contingency Tables 584 CHAPTER PROBLEM Is the nurse a serial killer? Three alert nurses at the Veteran’s Affairs Medical Center in Northampton, Massachusetts noticed an unusually high number of deaths at times when another nurse, Kristen Gilbert, was working Those same nurses later noticed missing supplies of the drug epinephrine, which is a synthetic adrenaline that stimulates the heart They reported their growing concerns, and an investigation followed Kristen Gilbert was arrested and charged with four counts of murder and two counts of attempted murder When seeking a grand jury indictment, prosecutors provided a key piece of evidence consisting of a two-way table showing the numbers of shifts with deaths when Gilbert was working See Table 11-1 Table 11-1 Two-Way Table with Deaths When Gilbert Was Working Shifts with a death Shifts without a death Gilbert was working 40 217 Gilbert was not working 34 1350 The numbers in Table 11-1 might be better understood with a graph, such as Figure 11-1, which shows the death rates during shifts when Gilbert was working and when she was not working Figure 11-1 seems to make it clear that shifts when Gilbert was working had a much higher death rate than shifts when she was not working, but we need to determine whether those results are statistically significant Figure 11-1 Bar Graph of Death Rates with Gilbert Working and Not Working George Cobb, a leading statistician and statistics educator, became involved in the Gilbert case at the request of an attorney for the defense Cobb wrote a report stating that the data in Table 11-1 should have been presented to the grand jury (as it was) for purposes of indictment, but that it should not be presented at the actual trial He noted that the data in Table 11-1 are based on observations and not show that Gilbert actually caused deaths Also, Table 11-1 includes information about many other deaths that were not relevant to the trial The judge ruled that the data in Table 11-1 could not be used at the trial Kristen Gilbert was convicted on other evidence and is now serving a sentence of life in prison, without the possibility of parole This chapter will include methods for analyzing data in tables, such as Table 11-1 We will analyze Table 11-1 to see what conclusions could be presented to the grand jury that provided the indictment 586 Chapter 11 Goodness-of-Fit and Contingency Tables 11-1 Review and Preview We began a study of inferential statistics in Chapter when we presented methods for estimating a parameter for a single population and in Chapter when we presented methods of testing claims about a single population In Chapter we extended those methods to situations involving two populations In Chapter 10 we considered methods of correlation and regression using paired sample data In this chapter we use statistical methods for analyzing categorical (or qualitative, or attribute) data that can be separated into different cells We consider hypothesis tests of a claim that the observed frequency counts agree with some claimed distribution We also consider contingency tables (or two-way frequency tables), which consist of frequency counts arranged in a table with at least two rows and two columns We conclude this chapter by considering two-way tables involving data consisting of matched pairs The methods of this chapter use the same x2 (chi-square) distribution that was first introduced in Section 7-5 See Section 7-5 for a quick review of properties of the x2 distribution 11-2 Goodness-of-Fit Key Concept In this section we consider sample data consisting of observed frequency counts arranged in a single row or column (called a one-way frequency table) We will use a hypothesis test for the claim that the observed frequency counts agree with some claimed distribution, so that there is a good fit of the observed data with the claimed distribution Because we test for how well an observed frequency distribution fits some specified theoretical distribution, the method of this section is called a goodness-of-fit test A goodness-of-fit test is used to test the hypothesis that an observed frequency distribution fits (or conforms to) some claimed distribution Objective Conduct a goodness-of-fit test Notation O E represents the observed frequency of an outcome, found by tabulating the sample data represents the expected frequency of an outcome, found by assuming that the distribution is as claimed k n represents the number of different categories or outcomes represents the total number of trials (or observed sample values) Requirements The data have been randomly selected The sample data consist of frequency counts for each of the different categories 11-2 For each category, the expected frequency is at least (The expected frequency for a category is the frequency that would occur if the data actually have the Goodness-of-Fit distribution that is being claimed There is no requirement that the observed frequency for each category must be at least 5.) Test Statistic for Goodness-of-Fit Tests x2 = a (O - E )2 E Critical Values values are found in Table A-4 by using k - degrees of freedom, where k is the number of categories Critical Goodness-of-fit hypothesis tests are always right- tailed P-Values P-values are typically provided by computer software, or a range of P-values can be found from Table A-4 Finding Expected Frequencies Conducting a goodness-of-fit test requires that we identify the observed frequencies, then determine the frequencies expected with the claimed distribution Table 11-2 on the next page includes observed frequencies with a sum of 80, so n = 80 If we assume that the 80 digits were obtained from a population in which all digits are equally likely, then we expect that each digit should occur in 1>10 of the 80 trials, so each of the 10 expected frequencies is given by E = In general, if we are assuming that all of the expected frequencies are equal, each expected frequency is E = n>k, where n is the total number of observations and k is the number of categories In other cases in which the expected frequencies are not all equal, we can often find the expected frequency for each category by multiplying the sum of all observed frequencies and the probability p for the category, so E = np We summarize these two procedures here • Expected frequencies are equal: E ‫ ؍‬n/k frequencies are not all equal: E ‫ ؍‬np for each individual category As good as these two preceding formulas for E might be, it is better to use an informal approach Just ask, “How can the observed frequencies be split up among the different categories so that there is perfect agreement with the claimed distribution?” Also, note that the observed frequencies must all be whole numbers because they represent actual counts, but the expected frequencies need not be whole numbers For example, when rolling a single die 33 times, the expected frequency for each possible outcome is 33>6 = 5.5 The expected frequency for rolling a is 5.5, even though it is impossible to have the outcome of occur exactly 5.5 times We know that sample frequencies typically deviate somewhat from the values we theoretically expect, so we now present the key question: Are the differences between the actual observed values O and the theoretically expected values E statistically significant? We need a measure of the discrepancy between the O and E values, so we use the test statistic given with the requirements and critical values (Later, we will explain how this test statistic was developed, but you can see that it has differences of O - E as a key component.) The x2 test statistic is based on differences between the observed and expected values If the observed and expected values are close, the x2 test statistic will be small and the P-value will be large If the observed and expected frequencies are not close, • Expected 587 588 Chapter 11 Goodness-of-Fit and Contingency Tables Figure 11-2 Compare the observed O values to the corresponding expected E values Relationships Among the X Test Statistic, P-Value, and Goodness-of-Fit Os and Es are close Os and Es are far apart Small X value, large P-value Large X value, small P-value X here X here Fail to reject H0 Reject H0 Good fit with assumed distribution Not a good fit with assumed distribution “If the P is low, the null must go.” the x2 test statistic will be large and the P-value will be small Figure 11-2 summarizes this relationship The hypothesis tests of this section are always right-tailed, because the critical value and critical region are located at the extreme right of the distribution If confused, just remember this: “If the P is low, the null must go.” (If the P-value is small, reject the null hypothesis that the distribution is as claimed.) Table 11-2 Last Digits of Weights Last Digit Frequency 14 10 8 12 Once we know how to find the value of the test statistic and the critical value, we can test hypotheses by using the same general procedures introduced in Chapter Last Digits of Weights Data Set in Appendix B includes weights from 40 randomly selected adult males and 40 randomly selected adult females Those weights were obtained as part of the National Health Examination Survey When obtaining weights of subjects, it is extremely important to actually weigh individuals instead of asking them to report their weights By analyzing the last digits of weights, researchers can verify that weights were obtained through actual measurements instead of being reported When people report weights, they typically round to a whole number, so reported weights tend to have many last digits consisting of In contrast, if people are actually weighed with a scale having precision to the nearest 0.1 pound, the weights tend to have last digits that are uniformly distributed, with 0, 1, 2, Á , all occurring with roughly the same frequencies Table 11-2 shows the frequency distribution of the last digits from the 11-2 Goodness-of-Fit 80 weights listed in Data Set in Appendix B (For example, the weight of 201.5 lb has a last digit of 5, and this is one of the data values included in Table 11-2.) Test the claim that the sample is from a population of weights in which the last digits not occur with the same frequency Based on the results, what can we conclude about the procedure used to obtain the weights? REQUIREMENT CHECK (1) The data come from randomly selected subjects (2) The data consist of frequency counts, as shown in Table 11-2 (3) With 80 sample values and 10 categories that are claimed to be equally likely, each expected frequency is 8, so each expected frequency does satisfy the requirement of being a value of at least All of the requirements are satisfied The claim that the digits not occur with the same frequency is equivalent to the claim that the relative frequencies or probabilities of the 10 cells ( p0, p1, Á , p9) are not all equal We will use the traditional method for testing hypotheses (see Figure 8-9) Step 1: The original claim is that the digits not occur with the same frequency That is, at least one of the probabilities p0, p1, Á , p9 is different from the others Step 2: If the original claim is false, then all of the probabilities are the same That is, p = p = p = p = p = p = p = p = p = p Step 3: The null hypothesis must contain the condition of equality, so we have H0: p = p = p = p = p = p = p = p = p = p H1: At least one of the probabilities is different from the others Step 4: No significance level was specified, so we select a = 0.05 Step 5: Because we are testing a claim about the distribution of the last digits being a uniform distribution, we use the goodness-of-fit test described in this section The x2 distribution is used with the test statistic given earlier Step 6: The observed frequencies O are listed in Table 11-2 Each corresponding expected frequency E is equal to (because the 80 digits would be uniformly distributed among the 10 categories) Table 11-3 on the next page shows the computation of the x2 test statistic The test statistic is x2 = 11.250 The critical value is x2 = 16.919 (found in Table A-4 with a = 0.05 in the right tail and degrees of freedom equal to k - = 9) The test statistic and critical value are shown in Figure 11-3 on the next page Step 7: Because the test statistic does not fall in the critical region, there is not sufficient evidence to reject the null hypothesis Step 8: There is not sufficient evidence to support the claim that the last digits not occur with the same relative frequency This goodness-of-fit test suggests that the last digits provide a reasonably good fit with the claimed distribution of equally likely frequencies Instead of asking the subjects how much they weigh, it appears that their weights were actually measured as they should have been Example involves a situation in which the claimed frequencies for the different categories are all equal The methods of this section can also be used when the hypothesized probabilities (or frequencies) are different, as shown in Example 589 Mendel’s Data Falsified? Because some of Mendel’s data from his famous genetics experiments seemed too perfect to be true, statistician R A Fisher concluded that the data were probably falsified He used a chi-square distribution to show that when a test statistic is extremely far to the left and results in a P-value very close to 1, the sample data fit the claimed distribution almost perfectly, and this is evidence that the sample data have not been randomly selected It has been suggested that Mendel’s gardener knew what results Mendel’s theory predicted, and subsequently adjusted results to fit that theory Ira Pilgrim wrote in The Journal of Heredity that this use of the chi-square distribution is not appropriate He notes that the question is not about goodness-of-fit with a particular distribution, but whether the data are from a sample that is truly random Pilgrim used the binomial probability formula to find the probabilities of the results obtained in Mendel’s experiments Based on his results, Pilgrim concludes that “there is no reason whatever to question Mendel’s honesty.” It appears that Mendel’s results are not too good to be true, and they could have been obtained from a truly random process 590 Chapter 11 Which Car Seats Are Safest? Many people believe that the back seat of a car is the safest place to sit, but is it? University of Buffalo researchers analyzed more than 60,000 fatal car crashes and found that the middle back seat is the safest place to sit in a car They found that sitting in that seat makes a passenger 86% more likely to survive than those who sit in the front seats, and they are 25% more likely to survive than those sitting in either of the back seats nearest the windows An analysis of seat belt use showed that when not wearing a seat belt in the back seat, passengers are three times more likely to die in a crash than those wearing seat belts in that same seat Passengers concerned with safety should sit in the middle back seat wearing a seat belt Goodness-of-Fit and Contingency Tables Table 11-3 Calculating the X Test Statistic for the Last Digits of Weights Observed Expected Frequency O Frequency E Last Digit O؊E (O ؊ E )2 (O ؊ E )2 E -1 1 14 36 4.500 -2 0.500 10 0.500 8 0 0.000 -4 16 2.000 -3 1.125 -2 0.500 12 16 2.000 0 0.000 0.125 (O - E )2 x2 = a = 11.250 E Fail to reject p0 ϭ p1 ϭ ϭ p9 Reject p0 ϭ p1 ϭ ϭ p9 X ϭ 16 919 Sample data: X ϭ 11 250 Figure 11-3 Test of p0 ‫ ؍‬p1 ‫ ؍‬p2 ‫ ؍‬p3 ‫ ؍‬p4 = p5 ‫ ؍‬p6 ‫؍‬ p7 ‫ ؍‬p8 ‫ ؍‬p9 World Series Games Table 11-4 lists the numbers of games played in the baseball World Series, as of this writing That table also includes the expected proportions for the numbers of games in a World Series, assuming that in each series, both teams have about the same chance of winning Use a 0.05 significance level to test the claim that the actual numbers of games fit the distribution indicated by the probabilities Table 11-4 Numbers of Games in World Series Contests Games played Actual World Series contests Expected proportion 19 21 22 37 2> 16 4> 16 5> 16 5> 16 11-2 Goodness-of-Fit REQUIREMENT CHECK (1) We begin by noting that the observed numbers of games are not randomly selected from a larger population However, we treat them as a random sample for the purpose of determining whether they are typical results that might be obtained from such a random sample (2) The data consist of frequency counts (3) Each expected frequency is at least 5, as will be shown later in this solution All of the requirements are satisfied Step 1: The original claim is that the actual numbers of games fit the distribution indicated by the expected proportions Using subscripts corresponding to the number of games, we can express this claim as p = 2>16 and p = 4>16 and p = 5>16 and p = 5>16 Step 2: If the original claim is false, then at least one of the proportions does not have the value as claimed Step 3: The null hypothesis must contain the condition of equality, so we have H0: p = 2>16 and p = 4>16 and p = 5>16 and p = 5>16 H1: At least one of the proportions is not equal to the given claimed value Step 4: The significance level is a = 0.05 Step 5: Because we are testing a claim that the distribution of numbers of games in World Series contests is as claimed, we use the goodness-of-fit test described in this section The x2 distribution is used with the test statistic given earlier Step 6: Table 11-5 shows the calculations resulting in the test statistic of x2 = 7.885 The critical value is x2 = 7.815 (found in Table A-4 with a = 0.05 in the right tail and degrees of freedom equal to k - = 3) The Minitab display shows the value of the test statistic as well as the P-value of 0.048 MINITAB Table 11-5 Calculating the X Test Statistic for the Numbers of World Series Games Observed Number of Frequency Games O 19 21 22 37 Expected Frequency E ‫ ؍‬np # 99 O؊E (O ؊ E )2 (O ؊ E )2 E = 12.3750 16 6.6250 43.8906 3.5467 99 # = 24.7500 16 - 3.7500 14.0625 0.5682 99 # = 30.9375 16 - 8.9375 79.8789 2.5819 99 # = 30.9375 16 6.0625 36.7539 1.1880 x2 = a (O - E )2 = 7.885 E 591 Which Airplane Seats Are Safest? Because most crashes occur during takeoff or landing, passengers can improve their safety by flying non-stop Also, larger planes are safer Many people believe that the rear seats are safest in an airplane crash Todd Curtis is an aviation safety expert who maintains a database of airline incidents, and he says that it is not possible to conclude that some seats are safer than others He says that each crash is unique, and there are far too many variables to consider Also, Matt McCormick, a survival expert for the National Transportation Safety Board, told Travel magazine that “there is no one safe place to sit.” Goodness-of-fit tests can be used with a null hypothesis that all sections of an airplane are equally safe Crashed airplanes could be divided into the front, middle, and rear sections The observed frequencies of fatalities could then be compared to the frequencies that would be expected with a uniform distribution of fatalities The x2 test statistic reflects the size of the discrepancies between observed and expected frequencies, and it would reveal whether some sections are safer than others 592 Chapter 11 Goodness-of-Fit and Contingency Tables Step 7: The P-value of 0.048 is less than the significance level of 0.05, so there is sufficient evidence to reject the null hypothesis (Also, the test statistic of x2 = 7.885 is in the critical region bounded by the critical value of 7.815, so there is sufficient evidence to reject the null hypothesis.) Step 8: There is sufficient evidence to warrant rejection of the claim that actual numbers of games in World Series contests fit the distribution indicated by the expected proportions given in Table 11-4 This goodness-of-fit test suggests that the numbers of games in World Series contests not fit the distribution expected from probability calculations Different media reports have noted that seven-game series occur much more than expected The results in Table 11-4 show that seven-game series occurred 37% of the time, but they were expected to occur only 31% of the time (A USA Today headline stated that “Seven-game series defy odds.”) So far, no reasonable explanations have been provided for the discrepancy In Figure 11-4 we graph the expected proportions of 2> 16, 4> 16, 5> 16, and 5> 16 along with the observed proportions of 19> 99, 21> 99, 22> 99, and 37> 99, so that we can visualize the discrepancy between the distribution that was claimed and the frequencies that were observed The points along the red line represent the expected proportions, and the points along the green line represent the observed proportions Figure 11-4 shows disagreement between the expected proportions (red line) and the observed proportions (green line), and the hypothesis test in Example shows that the discrepancy is statistically significant Figure 11-4 Observed Proportions 0.4 Proportion Observed and Expected Proportions in the Numbers of World Series Games 0.3 Expected Proportions 0.2 0.1 Number of Games in World Series P -Values Computer software automatically provides P-values when conducting goodness-of-fit tests If computer software is unavailable, a range of P-values can be found from Table A-4 Example resulted in a test statistic of x2 = 7.885, and if we refer to Table A-4 with degrees of freedom, we find that the test statistic of 7.885 lies between the table values of 7.815 and 9.348 So, the P-value is between 0.025 and 0.05 In this case, we might state that “P-value 0.05.” The Minitab display shows that the P-value is 0.048 Because the P-value is less than the significance level of 0.05, we reject the null hypothesis Remember, “if the P (value) is low, the null must go.” Rationale for the Test Statistic: Examples and show that the x2 test statistic is a measure of the discrepancy between observed and expected frequencies Simply summing the differences between observed and expected values does not result in an 11-2 Goodness-of-Fit 593 U S I N G T E C H N O LO GY effective measure because that sum is always Squaring the O - E values provides a better statistic (The reasons for squaring the O - E values are essentially the same as the reasons for squaring the x - x values in the formula for standard deviation.) The value of ©(O - E )2 measures only the magnitude of the differences, but we need to find the magnitude of the differences relative to what was expected This relative magnitude is found through division by the expected frequencies, as in the test statistic The theoretical distribution of ©(O - E )2>E is a discrete distribution because the number of possible values is finite The distribution can be approximated by a chi-square distribution, which is continuous This approximation is generally considered acceptable, provided that all expected values E are at least (There are ways of circumventing the problem of an expected frequency that is less than 5, such as combining categories so that all expected frequencies are at least Also, there are other methods that can be used when not all expected frequencies are at least 5.) The number of degrees of freedom reflects the fact that we can freely assign frequencies to k - categories before the frequency for every category is determined (Although we say that we can “freely” assign frequencies to k - categories, we cannot have negative frequencies nor can we have frequencies so large that their sum exceeds the total of the observed frequencies for all categories combined.) First enter the observed frequencies in the first S TAT D I S K column of the Data Window If the expected frequencies are not all equal, enter a second column that includes either expected proportions or actual expected frequencies Select Analysis from the main menu bar, then select the option Goodness-of-Fit Choose between “equal expected frequencies” and “unequal expected frequencies” and enter the data in the dialog box, then click on Evaluate Enter observed frequencies in column C1 If the M I N I TA B expected frequencies are not all equal, enter them as proportions in column C2 Select Stat, Tables, and Chi-Square Goodness-of-Fit Test Make the entries in the window and click on OK First enter the category names in one column, enter E XC E L the observed frequencies in a second column, and use a third column to enter the expected proportions in decimal form (such as 0.20, 0.25, 0.25, and 0.30) If using Excel 2010 or Excel 2007, click on AddIns, then click on DDXL; if using Excel 2003, click on DDXL Select the menu item of Tables In the menu labeled Function Type, select Goodness-of-Fit Click on the pencil icon for Category Names and enter the range of cells containing the category names, such as A1:A5 Click on the pencil icon for Observed Counts and 11-2 enter the range of cells containing the observed frequencies, such as B1:B5 Click on the pencil icon for Test Distribution and enter the range of cells containing the expected proportions in decimal form, such as C1:C5 Click OK to get the chi-square test statistic and the P-value Enter the observed frequencies in list TI-83/84 PLUS L1, then identify the expected frequencies and enter them in list L2 With a TI-84 Plus calculator, press K, select TESTS, select x2 GOF-Test, then enter L1 and L2 and the number of degrees of freedom when prompted (The number of degrees of freedom is less than the number of categories.) With a TI-83 Plus calculator, use the program X2GOF Press N, select X2GOF, then enter L1 and L2 when prompted Results will include the test statistic and P-value Basic Skills and Concepts Statistical Literacy and Critical Thinking Goodness-of-Fit A New York Times> CBS News Poll typically involves the selection of random digits to be used for telephone numbers The New York Times states that “within each (telephone) exchange, random digits were added to form a complete telephone number, thus permitting access to listed and unlisted numbers.” When such digits are randomly generated, what is the distribution of those digits? Given such randomly generated digits, what is a test for “goodness-of-fit”? 828 31 33 35 37 Appendix D correlation between movie budget amounts and the amounts that the movies grossed r = 0.319 Critical values: r = ;0.254 (approximately) P-value = 0.017 There is sufficient evidence to support the claim of a linear correlation between the numbers of words spoken by men and women who are in couple relationships A linear correlation between two variables does not necessarily mean that one of the variables is the cause of (or directly affects) the other variable Averages tend to suppress variation among individuals, so a linear correlation among averages does not necessarily mean that there is a linear correlation among individuals a 0.942 b 0.839 c 0.9995 (largest) d 0.983 e -0.958 Section 10-3 yN represents the predicted value of cholesterol The predictor variable represents weight The response variable represents cholesterol level If r is positive, the regression line has a positive slope and it rises from left to right If r is negative, the slope of the regression line is negative and it falls from left to right 6.40 people 76.3 beats per minute yN = 3.00 + 0.500x The data have a pattern that is not a straight line 11 a yN = 0.264 + 0.906x b yN = + 0x (or yN = 2) c The results are very different, indicating that one point can dramatically affect the regression equation 13 yN = -0.162 + 0.0101x; $1.68 (which might be rounded up to the more convenient value of $1.75) 15 yN = 43.6 + 1.31x ; 163.2 mm Hg 17 yN = -157 + 40.2x ; 205 kg 19 yN = -1240 + 7.07x ; $690 21 yN = 2060 - 0.186x ; $1615 The predicted cost of $1615 is very different from the actual cost of $982 23 yN = 10.5 + 0.0109x ; 14.5° Yes, the predicted temperature is the same as the actual temperature 25 yN = 0.494 + 0.000490x; 0.519 The predicted proportion of wins is reasonably close to the actual proportion of 0.543 27 yN = 71.8 + 0.0286x; 103 (or the known mean IQ score of 100) 29 yN = 20.6 + 1.38x; $186 million 31 yN = 13,400 + 0.302x; 15,200 words in a day 33 With b = 0, the regression line is horizontal so that different values of x result in the same y value, and there is no correlation between x and y 35 The residual plot does not suggest that the regression equation is a bad model, because there is no obvious pattern and the residual plot does not become thicker (or thinner) The scatterplot suggests that the regression equation is a bad model because the points not fit the pattern of a straight line very well RESIDUAL PLOT Section 10-4 It is the standard error of estimate, which is a measure of the differences between the observed weights and the weights predicted from the regression equation Unlike the predicted value of 180 lb, the prediction interval provides us with a range of likely weights, so that we have a sense of how accurate the prediction is likely to be The terminology of prediction interval is used for an interval estimate of a variable, whereas the terminology of confidence interval is used for an interval estimate of a parameter 0.762; 76.2% 0.748; 74.8% r = -0.806 (r is negative because the highway fuel consumption amounts decrease as the weight of the cars increase, as shown by the fact that the slope of the regression line is negative) The critical value of r is between 0.361 and 0.335 (assuming a 0.05 significance level) P-value = 0.000 There is sufficient evidence to support a claim of a linear correlation between the weights of cars and their highway fuel consumption amounts 11 27.028 mi> gal 13 a 2.64829 b 0.0800433 c 2.728333 d 0.9706622 e 0.1414596 15 a 8880.182 b 991.1515 c 9871.333 d 0.8995929 e 15.74128 17 a $1.72 b $1.27 y $2.17 19 a 205 kg b 157 kg y 253 kg 21 $1.32 y $2.72 23 $0.12 y $0.90 25 -0.229 b 0.298; 0.738 b 1.15 Section 10-5 b1 is the coefficient of x1 and b is the coefficient of x1, but b is a population parameter obtained by using all of the data in the population, whereas b1 is a sample statistic that is an estimate of b The value of b1 is found from the available sample data Appendix D No, because the data are categorical (or qualitative) The methods of this section require quantitative data (Exceptions can occur with logistic regression.) Nicotine = 1.59 + 0.0231 Tar - 0.0525 CO, or yN = 1.59 + 0.0231x1 - 0.0525x2, where yN represents the predicted amount of nicotine, x1 represents the amount of tar, and x2 represents the amount of carbon monoxide No The adjusted R of 0.017 is too small and the P-value of 0.317 is too high The multiple regression equation is not a good model for predicting the amount of nicotine LP (list price), because it has the best combination of lowest P-value (0.000) and highest adjusted R (0.990) 11 yN = 99.2 + 0.979 LP, because it has the lowest P-value of 0.000 and the highest adjusted R of 0.990 13 The best regression equation is yN = 0.127 + 0.0878x 0.0250x 2, where x1 represents tar and x2 represents carbon monoxide It is best because it has the highest adjusted R value of 0.927 and the lowest P-value of 0.000 It is a good regression equation for predicting nicotine content because it has a high value of adjusted R and a low P-value 15 The regression equation yN = 47.9 - 0.00440x - 0.823x (where x1 = weight and x2 = engine displacement) has the highest adjusted R of 0.644 and the lowest P-value of 0.000 It would be better to use the regression equation yN = 50.5 0.00587x (where x represents weight), because it also has a P-value of 0.000 and its adjusted R value of 0.639 is only a little less than 0.644, but it is better to use a regression equation with fewer predictor variables 17 For H0: b = 0, the test statistic is t = 5.486, the P-value is 0.000, and the critical values are t = ;2.110, so reject H0 and conclude that the regression coefficient of b1 = 0.707 should be kept For H0: b = 0, the test statistic is t = -1.292, the P-value is 0.213, and the critical values are t = ;2.110, so fail to reject H0 and conclude that the regression coefficient of b = 0.164 should be omitted It appears that the regression equation should include the height of the mother as a predictor variable, but the height of the father should be omitted 19 yN = 3.06 + 82.4x + 2.91x 2, where x1 represents sex and x2 represents age Female: 61 lb; male: 144 lb The sex of the bear does appear to have an effect on its weight The regression equation indicates that the predicted weight of a male bear is about 82 lb more than the predicted weight of a female bear with other characteristics being the same Section 10-6 The value of R = indicates that the model fits the data perfectly Given the inherent variation and changing economic conditions, it is very unlikely that the model will fit the data perfectly, so the claim is likely to be wrong 8.2% of the variation in Super Bowl points can be explained by the quadratic model that relates the variable of year and the variable of points scored Because such a small percentage of such variation is explained by the model, the model is not very useful Linear: y = + 2x 829 Quadratic: d = -16t + 500 Exponential: y = 0.158270(1.05935x), where 1960 is coded as With the high R value of 0.970, the model appears to be good The projected value for 2020 is $5.33 11 Quadratic: y = 0.00802808x + 2.45538x + 15.3223, where 1980 is coded as The quadratic model is not much better than the linear, exponential, and power models The projected number of 87 isn’t too far from the actual number of 92 13 Quadratic: y = 4.90x - 0.0286x + 0.00476 The model yields a distance of 705 m, but the building is only 50 m tall, so the distance cannot exceed 50 m 15 Quadratic: y = 0.00447053x + 0.0198551x + 13.8205, where 1950 is coded as 1, 1955 is coded as 2, and so on The predicted temperature for 2010 is 14.8°C (x - 1) 17 a Exponential: y = (or y = (0.629961)(1.587401)x for an initial value of that doubles every 1.5 years) b Exponential: y = (1.33197)(1.43035)x, where 1971 is coded as c Moore’s law does appear to be working reasonably well With R = 0.988, the model appears to be very good 19 a 6641.8 b 73.2 c The quadratic sum of squares of residuals (73.2) is less than the sum of squares of residuals from the linear model (6641.8) Chapter 10 Statistical Literacy and Critical Thinking With the method of correlation discussed in Section 10-2, the objective is to determine whether there is an association between two variables With the methods of Section 9-4, the objective is to make inferences about the mean of the differences between the values in the population of matched pairs There is sufficient evidence to support the claim of a linear correlation between chest size and weight Although there is a linear correlation, we cannot conclude that a larger chest size is the cause of a larger weight She is not correct The value of r = indicates that there is sufficient evidence to support a claim of a linear correlation between the two sets of prices, but that does not necessarily mean that both sets of prices are the same For example, if each ring at the discount company is priced at 50% of the Tiffany price, the value of r will be 1, but the corresponding prices are very different The conclusion is not necessarily correct The value of r = suggests that there is no linear correlation between the two variables, but there might be some other nonlinear correlation Chapter 10 Quick Quiz An error was made in the calculations, because r must always be between - and There is sufficient evidence to support the claim of a linear correlation between the two variables True There is not sufficient evidence to support the claim of a linear correlation between the two variables 830 Appendix D False There might be a nonlinear relationship between the two variables ;0.514 r = -1 15 0.160 10 False a The scatterplot suggests that there is not a linear correlation between upper leg lengths and heights of males Chapter 10 Review Exercises a The scatterplot suggests that there is not a linear correlation between the two variables b r = -0.254 Critical values: r = ;0.811 (assuming a 0.05 significance level) P-value = 0.627 There is not sufficient evidence to support a claim of a linear correlation between 8:00 AM temperatures and midnight temperatures c yN = 126 - 0.285x d 98.45oF (the value of y) a r = 0.522, the critical values are r = ;0.312 (assuming a 0.05 significance level), and the P-value is 0.001, so there is sufficient evidence to support a claim of a linear correlation between heights and weights of males b 27.2% c yN = -139 + 4.55x d 189 lb a The scatterplot shows that as length increases, weight also tends to increase, so there does appear to be a correlation between length and weight b r = 0.964 Critical values: r = ;0.878 (assuming a 0.05 significance level) P-value = 0.008 There is sufficient evidence to support a claim of a linear correlation between lengths of bears and their weights c yN = -468 + 12.3x d 418 lb b r = 0.723 Critical values: r = ;0.878 (assuming a 0.05 significance level) P-value = 0.168 There is not sufficient evidence to support a claim of a linear correlation between upper leg lengths and heights of males c yN = 97.3 + 1.74x d 170.2 cm (the value of y) yN = 140 + 2.50x - 2.27x 2; R = 0.977; adjusted R = 0.954; P-value = 0.023 Using a 0.05 significance level, the multiple regression equation is a good model and can be used to predict the height of a male when given his upper leg length and arm circumference Chapter 10 Cumulative Review Exercises From 1877: x = 66.2 in.; median: 66 in.; s = 2.5 in From recent results: x = 68.1 in.; median: 68.5 in.; s = 3.6 in H0: m1 = m2 H1: m1 m2 Test statistic: t = -1.372 Critical value: t = -1.833 P-value 0.05 (Tech: 0.094) Fail to reject H0 There is not sufficient evidence to support the claim that males in 1877 had a mean height that is less than the mean height of males today H0: m = 69.1 H1: m 69.1 Test statistic: t = -3.690 Critical value: t = -1.833 P- value 0.005 (Tech: 0.0025) There is sufficient evidence to support the claim that heights of men from 1877 have a mean less than 69.1 in 64.4 in m 68.0 in - 1.2 in m1 - m2 5.0 in (Tech: - 1.0 in m1 - m2 4.8 in.) The confidence interval limits include 0, which indicates that the two population means could be the same, so there is not a significant difference between the two population means The two sets of sample data are not matched as required, so the value of the linear correlation coefficient is meaningless in this situation a A statistic is a numerical measurement describing some characteristic of a sample, but a parameter is a numerical measurement describing some characteristic of a population b A simple random sample is one chosen in such a way that every possible sample of the same size has the same chance of being chosen c A voluntary response sample is one in which the respondents themselves decide whether to be included Results from such samples are generally unsuitable because people with strong interests in the topic are more likely to respond, and the result is a sample that is not representative of the population Appendix D Yes Different explanations are possible, including these: 40 is more than two standard deviations away from the mean; only 0.26% of BMI values are 40 or above; 40 converts to a z score of 2.8, indicating that it is 2.8 standard deviations above the mean a 0.3446 b 0.0548 10 0.000207 Because the probability of getting four subjects with green eyes is so small, it is likely that the researcher did not randomly select the subjects Chapter 11 Answers Section 11-2 The random digits should all be equally likely, so the distribution is uniform The test for goodness-of-fit is a hypothesis test that the sample data agree with or fit the uniform distribution with all of the digits being equally likely O represents the observed frequencies, and they are 5, 8, 7, 9, 13, 17, 11, 10, 10, 12, 8, 10 E represents the expected frequencies, and the twelve expected frequencies are each 10 Critical value: x2 = 16.919 P-value 0.10 (Tech: 0.516) There is not sufficient evidence to warrant rejection of the claim that the observed outcomes agree with the expected frequencies The slot machine appears to be functioning as expected Test statistic: x2 = 70.160 Critical value: x2 = 7.815 P-value 0.005 (Tech: 0.000) There is sufficient evidence to warrant rejection of the claim that the four categories are equally likely The results appear to support the expectation that the frequency for the first category is disproportionately high Test statistic: x2 = 6.320 Critical value: x2 = 7.815 P-value 0.05 (Tech: 0.097) There is not sufficient evidence to warrant rejection of the claim that the four categories are equally likely The results not support the expectation that the frequency for the first category is disproportionately high 11 Test statistic: x2 = 5.860 Critical value: x2 = 11.071 (Tech: P-value = 0.3201.) There is not sufficient evidence to support the claim that the outcomes are not equally likely The outcomes appear to be equally likely, so the loaded die does not appear to behave differently from a fair die 13 Test statistic: x2 = 13.193 Critical value: x2 = 16.919 P-value 0.10 (Tech: 0.154) There is not sufficient evidence to warrant rejection of the claim that the likelihood of winning is the same for the different post positions Based on these results, post position should not be considered when betting on the Kentucky Derby race 15 Test statistic: x2 = 1159.820 Critical value: x2 = 19.675 P-value 0.005 (Tech: 0.000) There is sufficient evidence to warrant rejection of the claim that UFO sightings occur in the different months with equal frequency July and August have the highest frequencies, and those summer months are times when people are outdoors more than other months 17 Test statistic: x2 = 15.822 Critical value: x2 = 7.815 P-value 0.005 (Tech: 0.001) There is sufficient evidence to warrant rejection of the claim that the the observed frequencies agree with the proportions that were expected according to principles of genetics 831 19 Test statistic: x2 = 6.682 Critical value: x2 = 11.071 (assuming a 0.05 significance level) P-value 0.10 (Tech: 0.245) There is not sufficient evidence to warrant rejection of the claim that the color distribution is as claimed 21 Test statistic: x2 = 3650.251 Critical value: x2 = 20.090 P-value 0.005 (Tech: 0.000) There is sufficient evidence to warrant rejection of the claim that the leading digits are from a population with a distribution that conforms to Benford’s law It does appear that the checks are the result of fraud 23 Test statistic: x2 = 49.689 Critical value: x2 = 20.090 P-value 0.005 (Tech: 0.000) There is sufficient evidence to warrant rejection of the claim that the leading digits are from a population with a distribution that conforms to Benford’s law The contribution amounts not appear to be legitimate 25 The test statistic changes from 11.250 to 247.280 (Tech: The P-value changes from 0.259 to 0.000.) The effect of the outlier is dramatic Section 11-3 P-value = 0.0000000000173 Because the P-value is so low, we should reject the claim that getting paralytic polio is independent of whether the child was treated with the Salk vaccine or was given a placebo The Salk vaccine appears to be effective The P-value of 0.0000000000173 is the probability of getting sample results at least as extreme as those given in the contingency table, assuming that getting paralytic polio is independent of whether a child was treated with the Salk vaccine or was given a placebo Fail to reject the null hypothesis that home> visitor wins are independent of the sport It appears that the home-field advantage does not depend on the sport Test statistic: x2 = 2.235 Critical value: x2 = 3.841 P-value 0.10 (Tech: 0.135) There is not sufficient evidence to warrant rejection of the claim of independence between success in challenges and sex of the player Neither sex appears to be more successful Test statistic: x2 = 25.571 Critical value: x2 = 3.841 P-value 0.005 (Tech: 0.000) There is sufficient evidence to warrant rejection of the claim that whether a subjects lies is independent of the polygraph test indication The results suggest that polygraphs are effective in distinguishing between truths and lies, but there are many false positives and false negatives, so they are not highly reliable 11 Test statistic: x2 = 0.751 Critical value: x2 = 3.841 P-value 0.10 (Tech: 0.386) There is not sufficient evidence to warrant rejection of the claim of independence between the type of restoration and adverse health conditions Amalgam restorations not appear to affect health conditions 13 Test statistic: x2 = 42.557 Critical value: x2 = 3.841 P-value 0.005 (Tech: 0.000) There is sufficient evidence to warrant rejection of the claim that the sentence is independent of the plea The results encourage pleas for guilty defendants 15 Test statistic: x2 = 9.750 Critical value: x2 = 6.635 P-value 0.005 (Tech: 0.002) There is sufficient evidence to warrant rejection of the claim that success is independent of the type of treatment The results suggest that the surgery treatment is better 832 Appendix D 17 Test statistic: x2 = 0.792 Critical value: x2 = 5.991 P-value 0.10 (Tech: 0.673) There is not sufficient evidence to warrant rejection of the claim that the sex of the respondent is independent of the choice for the cause of global warming Men and women appear to generally agree 19 Test statistic: x2 = 42.568 Critical value: x2 = 9.210 P-value 0.005 (Tech: 0.000) There is sufficient evidence to warrant rejection of the claim that experiencing an adverse reaction in the digestive system is independent of the treatment group Treatments with Campral appear to be associated with a decrease in adverse effects on the digestive system 21 Test statistic: x2 = 0.773 Critical value: x2 = 7.815 P-value 0.10 (Tech: 0.856) There is not sufficient evidence to warrant rejection of the claim that getting an infection is independent of the treatment The atorvastatin treatment does not appear to have an effect on infections 23 Test statistic: x2 = 51.458 Critical value: x2 = 6.635 P-value 0.005 (Tech: 0.000) There is sufficient evidence to warrant rejection of the claim that the proportions of agree> disagree responses are the same for the subjects interviewed by men and the subjects interviewed by women It appears that the gender of the interviewer affected the responses of women 25 Test statistics: x2 = 2.234562954 and z = 1.494845462, so that z = x2 Critical values: x2 = 3.841 and z = ;1.96, so that z = x2 (approximately) Section 11-4 Neither format appears to be better Considering the results from the discordant cases, we see that the frequencies of 31 and 33 are quite close, indicating that students performed about the same with both formats When trying to determine whether there is a significant difference between the two different formats, we are not helped by subjects whose results from the two different formats are the same, as in the mastery> mastery results and the nonmastery> nonmastery results We can detect a difference by analyzing those cases in which one of the formats resulted in mastery while the other format did not, as in the discordant pairs 1017 652 b, c 11 6.635 13 Test statistic: x2 = 0.237 Critical value: x2 = 3.841 P-value 0.10 (Tech: 0.626) There is not sufficient evidence to warrant rejection of the claim that the following two proportions are the same: (1) the proportion of subjects with no hip fracture on the protected hip and a hip fracture on the unprotected hip; (2) the proportion of subjects with a hip fracture on the protected hip and no hip fracture on the unprotected hip The hip protectors not appear to be effective 15 Test statistic: x2 = 2.382 Critical value: x2 = 3.841 (Tech: P-value = 0.123.) There is not sufficient evidence to warrant rejection of the null hypothesis that the following two proportions are the same: (1) The proportion of subjects with no cure on the fungicide-treated foot and a cure on the foot treated with a placebo; (2) the proportion of subjects with a cure on the fungicide-treated foot and no cure on the foot treated with a placebo The fungicide treatment does not appear to be effective 17 Test statistic: x2 = 6.750 Critical value: x2 = 3.841 (assuming a 0.05 significance level) (Tech: P-value = 0.009.) Reject the null hypothesis that the following two proportions are the same: (1) The proportion of tumors with incorrect staging from MRI and correct staging from PET> CT; (2) the proportion of tumors with correct staging from MRI and incorrect staging from PET> CT The PET> CT technology appears to be more accurate 19 The uncorrected test statistic is 11.267 The uncorrected value is greater than the corrected value of 9.600 The conclusion is the same in this case There could be cases in which the uncorrected test statistic leads to rejection of the null hypothesis while the corrected test statistic does not 21 When rounded to three decimal places, the same P-value of 0.289 is obtained With a P-value of 0.289, fail to reject the null hypothesis that the following two proportions are the same: (1) The proportion of subjects with one foot cured with the Pedacream treatment while the other foot is not cured with the Fungacream treatment; (2) the proportion of subjects with one foot not cured with the Pedacream treatment while the other foot is cured with the Fungacream treatment There does not appear to be a significant difference between the two treatments Chapter 11 Statistical Literacy and Critical Thinking The numbers are frequency counts for the six different categories corresponding to the six different cells in the table The categories are nausea with Celebrex, nausea with Ibuprofen, and so on The frequency counts are categorized according to two different variables: (1) whether the subject experienced nausea; (2) whether the subject was treated with Celebrex, Ibuprofen, or a placebo No, we can only conclude that nausea is associated with the treatment, but we cannot attribute the cause of the nausea to the treatment If we assume that experiencing or not experiencing nausea is independent of the treatment (Celebrex, Ibuprofen, placebo), the expected number of subjects who experience nausea with the Celebrex treatment is 160.490 Chapter 11 Quick Quiz H0: p = p = p = p = p = p = p H1: At least one of the proportions is different from the others Observed: 40; expected: 30.857 x2 = 12.592 There is not sufficient evidence to warrant rejection of the claim that fatal DWI crashes occur equally on the different days of the week Right-tailed Chi-square H0: Response is independent of whether the person responding is a worker or senior-level boss H1: Response and whether the person is a worker or senior-level boss are dependent x2 = 3.841 Appendix D There is sufficient evidence to warrant rejection of the claim that response is independent of whether the person is a worker or senior-level boss Response appears to be somehow associated with whether the person is a worker or senior-level boss 10 There is not sufficient evidence to warrant rejection of the claim that response is independent of whether the person is a worker or senior-level boss Response and whether the person is a worker or senior-level boss appear to be independent of each other Chapter 11 Review Exercises Test statistic: x2 = 9.294 Critical value: x2 = 5.991 Pvalue 0.01 (Tech: 0.0096) There is sufficient evidence to warrant rejection of the claim of independence between experiencing nausea and the type of treatment The adverse reaction of nausea does not appear to be about the same for the different treatments Test statistic: x2 = 36.366 Critical value: x2 = 16.812 P-value 0.005 (Tech: 0.000) There is sufficient evidence to warrant rejection of the claim that deaths from lightning occur on the days of the week with equal frequency Sunday appears to have disproportionately more deaths from lightning, and that might be explained by the fact that many people engage in outdoor recreational activities on Sundays Test statistic: x2 = 51.270 Critical value: x2 = 9.488 P-value 0.005 (Tech: 0.000) There is sufficient evidence to warrant rejection of the claim that that the participants fit the same distribution as the U.S population If study participants are not representative of the population, the results might be misleading because some groups might have cancer rates different from others, and they might skew the outcomes Test statistic: x2 = 10.732 Critical value: x2 = 3.841 P-value 0.005 (Tech: 0.001) There is sufficient evidence to warrant rejection of the claim that whether a subject smokes is independent of whether the subject was treated with bupropion hydrochloride or a placebo It appears that the bupropion hydrochloride treatment is effective in the sense that it is better than a placebo, but that the treatment is not highly effective because many of those in the treatment group continued to smoke Test statistic: x2 = 16.264 Critical value: x2 = 3.841 P-value 0.005 (Tech: 0.000) There is sufficient evidence to warrant rejection of the claim that the two given proportions are the same The results indicate that when reporting about coughing of children, there is substantial disagreement between the children and their parents Chapter 11 Cumulative Review Exercises a b c d Observational study Discrete Statistics The organizations releasing the data have a special interest in the topic and those who conducted the research may have been influenced by them H0: p = p H1: p Z p Test statistic: z = - 20.35 Critical values: z = ;1.96 P-value: 0.0002 (Tech: 0.000) Reject H0 There is sufficient evidence to warrant rejection of the claim that the proportion of men who wash their hands is equal to the proportion of women who wash their hands There is a significant difference between men and women 833 Test statistic: x2 = 414.230 Critical value: x2 = 3.841 P-value 0.005 (Tech: 0.000) There is sufficient evidence to warrant rejection of the claim that hand washing is independent of gender It appears that whether a person washes their hands is related to the gender of the person First round: x = 71.2, median: 71.5, range = 8.0, s = 3.2 Fourth round: x = 70.3, median: 69.5, range = 4.0, s = 1.8 The fourth round scores are slightly lower and closer together r = -0.406 Critical values: r = ;0.811 P-value = 0.425 There is not sufficient evidence to support the claim of a linear correlation between the first round scores and the fourth round scores 67.8 m 74.5 We have 95% confidence that the limits of 67.8 and 74.5 contain the true value of the mean first round golf score This means that if we were to repeat the first round under the same conditions and construct the 95% confidence interval from the six scores, 95% of the confidence intervals would contain the true population mean score 0.828 p 0.932 Given that such a high percentage of senior executives believe that a thank-you note is helpful, it would be wise to send a thank-you note after every job interview H0: p = 0.75 H1: p 0.75 Test statistic: z = 3.68 Critical value: z = 2.33 P-value: 0.0001 Reject H0 There is sufficient evidence to support the claim that more than 75% of all senior executives believe that a thank-you note after a job interview increases the chances of the applicant a 0.2033 (Tech: 0.2040) b 0.0005 c The result from part (a) is more relevant The designers need to consider the upper leg lengths of individual men who will occupy the cockpit; a cockpit will never be occupied by a group of 16 men 10 0.677 It is not unusual, because the probability is not low, such as 0.05 or less Chapter 12 Answers Section 12-2 a The data are categorized according to the one characteristic or factor of epoch b The terminology of analysis of variance refers to the method used to test for equality of the three means That method is based on two different estimates of the population variance We should reject the null hypothesis that the three epochs have the same mean skull breadth There is sufficient evidence to conclude that at least one of the means is different from the others Test statistic: F = 9.4695 Critical value of F is approximately 3.3158 (Tech: 3.2849) P-value: 0.000562 Reject H0: m1 = m2 = m3 There is sufficient evidence to warrant rejection of the claim that the three books have the same mean Flesch Reading Ease score Test statistic: F = 0.3521 Critical value: F = 2.6626 P-value: 0.7877 Fail to reject H0: m1 = m2 = m3 = m4 There is not sufficient evidence to warrant rejection of the claim that the mean weight loss is the same for the diets The diets not appear to be very effective 834 Appendix D Test statistic: F = 5.313 Critical value of F is approximately 3.0718 (Tech: 3.0738) P-value: 0.010 Reject H0: m1 = m2 = m3 There is sufficient evidence to warrant rejection of the claim that PG movies, PG-13 movies, and R movies have the same mean gross amount 11 Test statistic: F = 0.3974 Critical value: F = 3.3541 P-value: 0.6759 Fail to reject H0: m1 = m2 = m3 There is not sufficient evidence to warrant rejection of the claim that the different car categories have the same mean These data not suggest that larger cars are safer 13 Test statistic: F = 27.2488 Critical value: F = 3.8853 P-value: 0.0000 Reject H0: m1 = m2 = m3 There is sufficient evidence to warrant rejection of the claim that the three different miles have the same mean time These data suggest that the third mile appears to take longer, and a reasonable explanation is that the third lap has a hill 15 Test statistic: F = 18.9931 Critical value of F is approximately 3.1504 (Tech: 3.1239) P-value: 0.0000 Reject H0: m1 = m2 = m3 There is sufficient evidence to warrant rejection of the claim that the three different types of cigarettes have the same mean amount of nicotine Given that the king size cigarettes have the largest mean of 1.26 mg per cigarette, compared to the other means of 0.87 mg per cigarette and 0.92 mg per cigarette, it appears that the filters make a difference (although this conclusion is not justified by the results from analysis of variance) 17 The Tukey test results show that the mean from small cars is significantly different from the mean from large cars The conclusion is the same as that obtained with the Bonferroni test results Section 12-3 The data are categorized using the two different variables of sex and age bracket Because each cell contains the same number of sample values (five), this is a balanced design Test statistic: F = 0.24 P-value: 0.625 Fail to reject the null hypothesis of no interaction effect There is not sufficient evidence to warrant rejection of the claim that heights are not affected by an interaction between sex and age bracket There does not appear to be an effect from an interaction between sex and age bracket Test statistic: F = 2.95 P-value: 0.091 Fail to reject the null hypothesis of no effect from age bracket There is not sufficient evidence to support the claim that age bracket (under 30, over 30) has an effect on height Test statistic: F = 2.25 P-value: 0.159 Fail to reject the null hypothesis of no effect from the type of car There is not sufficient evidence to support the claim that whether the car is foreign or domestic has an effect on head injury measurements 11 Test statistic: F = 3.73 P-value: 0.0291 Reject the null hypothesis of no interaction effect There is sufficient evidence to warrant rejection of the claim that measurements of self-esteem are not affected by an interaction between the subject’s selfesteem and the target’s self-esteem There does appear to be an effect from an interaction between the self-esteem of the subject and their perceived self-esteem of the target 13 Test statistic: F = 1.5824 P-value: 0.2132 Fail to reject the null hypothesis of no effect from self-esteem of the subject There is not sufficient evidence to support the claim that the self-esteem of the subject (low, medium, high) has an effect on the measurements of self-esteem 15 For interaction, the test statistic is F = 41.38 and the P-value is 0.000, so there is a significant interaction effect The rating appears to be affected by an interaction between the use of the supplement and the amount of whey 17 Row factor: Test statistic is F = 0.10 and P-value = 0.771, so fail to reject the null hypothesis of no effect from the row factor of supplement; there is not sufficient evidence to support the claim that ratings are affected by the use of the supplement Column factor: Test statistic is F = 1.23 and P-value = 0.434, so fail to reject the null hypothesis of no effect from the column factor of the amount of whey; there is not sufficient evidence to support the claim that ratings are affected by the amount of whey Chapter 12 Statistical Literacy and Critical Thinking Because two of the samples are from the same source at the same times, they are not independent One-way analysis of variance should not be used because it has a strict requirement of independent data sets One-way analysis of variance is used with sample data categorized according to one factor, but two-way analysis of variance is used with sample data categorized according to two different factors No, because the MPAA ratings of PG, PG-13, and R are qualitative data, but methods of one-way analysis of variance or twoway analysis of variance require quantitative data Using the least expensive cars could easily result in biased samples consisting of cars that include fewer safety features and are not as well built, so such biased samples should not be used for making inferences about all cars Chapter 12 Quick Quiz Test a null hypothesis that three or more samples are from populations with equal means Right-tailed Smaller P-values F = 8.98 There is sufficient evidence to reject the null hypothesis that the three populations of grade level reading scores have the same mean It appears that at least one of the books has a population mean grade level reading score that is different from the other population means With one-way analysis of variance, the different samples are categorized using only one factor, but with two-way analysis of variance, the sample data are categorized into different cells determined by two different factors Fail to reject the null hypothesis of no interaction There does not appear to be an effect due to an interaction between sex and major Appendix D There is not sufficient evidence to support a claim that the length estimates are affected by the sex of the subject 10 There is not sufficient evidence to support a claim that the length estimates are affected by the subject’s major Chapter 12 Review Exercises a One (type of diet) b One-way analysis of variance c Because the P-value is high, it appears that the four samples have means that not differ by significant amounts It appears that the mean ages of the four treatment groups are about the same d A small P-value would indicate that at least one of the treatment groups has a mean age that is significantly different from the others, so we would not know if differences from the diet treatments are due to the diets or to differences in age A small P-value would undermine the effectiveness of the experiment Test statistic: F = 54.70 P-value: 0.000 Reject H0: m1 = m2 = m3 There is sufficient evidence to warrant rejection of the claim that the different car categories have the same mean weight Because the sample means are 3095.0 lb (4 cylinders), 3835.0 lb (six cylinders), and 4159.3 (eight cylinders), it does appear that cars with more cylinders weigh more Test statistic: F = 0.37 P-value: 0.698 Fail to reject the null hypothesis of no interaction effect There is not sufficient evidence to warrant rejection of the claim that left femur loads are not affected by an interaction between the type of car (foreign, domestic) and size of the car (small, medium, large) There does not appear to be an effect from an interaction Test statistic: F = 1.82 P-value: 0.202 Fail to reject the null hypothesis of no effect from the type of car There is not sufficient evidence to support the claim that whether the car is foreign or domestic has an effect on left femur load measurements Test statistic: F = 0.48 P-value: 0.632 Fail to reject the null hypothesis of no effect from the size of the car There is not sufficient evidence to support the claim that whether the car is small, medium, or large has an effect on left femur load measurements Test statistic: F = 0.5010 Critical value of F is approximately 3.1504 (Tech: 3.1239) P-value: 0.6080 Fail to reject H0: m1 = m2 = m3 There is not sufficient evidence to warrant rejection of the claim that the three different types of cigarettes have the same mean amount of carbon monoxide It appears that the filters not make a difference in the amount of carbon monoxide For interaction, the test statistic is F = 0.8733 and the P-value is 0.3685, so there is no significant interaction effect For gender, the test statistic is F = 0.0178 and the P-value is 0.8960, so there is no significant effect from gender For smoking, the test statistic is F = 3.0119 and the P-value is 0.1082, so there is no significant effect from smoking Test statistic: F = 3.1095 P-value = 0.0506 Using a 0.05 significance level, fail to reject H0: m1 = m2 = m3 There is not sufficient evidence to warrant rejection of the claim that the three groups have the same mean longevity The survival times appear to have means that are the same 835 Chapter 12 Cumulative Review Exercises a 15.5, 13.1, 22.7 b 9.7, 9.0, 18.6 Test statistic: t = -1.383 Critical values: t = ;2.160 (assuming an 0.05 significance level) (Tech: P-value = 0.1860.) Fail to reject H0: m1 = m2 There is not sufficient evidence to support the claim that there is a difference between the means for the two groups Normal, because the histogram is approximately bell-shaped 12.3 years m 18.7 years a Interval b Continuous c 97.87°F d 98.00°F e 2.40°F f 0.71°F g 0.50 degree Rating Frequency 1.0–1.9 2.0–2.9 3.0–3.9 4.0–4.9 5.0–5.9 6.0–6.9 7.0–7.9 8.0–8.9 4 9 7 No The shape of the histogram is not reasonably close to the bell shape of a normal distribution The sample means of 6.63, 6.43, and 6.18 not appear to be very different Analysis of variance test: Test statistic is F = 0.0886 and the P-value is 0.915, so fail to reject H0: m1 = m2 = m3 and conclude that there is not sufficient evidence to warrant rejection of the claim that PG movies, PG-13 movies, and R movies have the same mean viewer rating a r = 0.603 Critical values: r = ;0.878 (assuming a 0.05 significance level) P-value = 0.282 There is not sufficient evidence to support the claim of a linear correlation between amounts of discarded paper and plastic b yN = -2.17 + 0.405x c No Because there is not a linear correlation between the amounts of discarded paper and plastic, the regression line does not fit the data well enough 10 Test statistic: x2 = 2.909 Critical value: x2 = 5.991 P-value 0.10 (Tech: 0.2335) There is not sufficient evidence to warrant rejection of the claim of independence between injury category and whether the firearm was a handgun or a rifle or shotgun 836 Appendix D Chapter 13 Answers Section 13-2 The sign test does not have a requirement that the sample data come from a population with a particular distribution The alternative hypothesis is that the proportion of girls is greater than 0.5, but the sample proportion of girls is 39> 211, which is less than 0.5 The test is not necessary because there is no way that a sample proportion of 39> 211 (or 0.185) could ever support a claim that the population proportion is greater than 0.5 The test statistic of x = is less than or equal to the critical value of There is sufficient evidence to warrant rejection of the claim of no difference There does appear to be a difference The test statistic of z = -0.48 is not in the critical region bounded by z = -1.96 and 1.96 There is not sufficient evidence to warrant rejection of the claim of no difference There does not appear to be a difference The test statistic of x = is less than or equal to the critical value of There is sufficient evidence to reject the claim of no difference Based on the sample data, it appears that when awards are presented, there is a difference between the age of the best actress and the best actor 11 The test statistic of x = is not less than or equal to the critical value of There is not sufficient evidence to reject the claim of no difference There is not sufficient evidence to support the claim that the time spent depicting the use of alcohol is less than the time spent depicting the use of tobacco 13 The test statistic of z = -9.09 is in the critical region bounded by z = -2.33 There is sufficient evidence to warrant rejection of the claim of no difference The YSORT method appears to be effective in increasing the likelihood of a boy 15 The test statistic of z = -0.88 is not in the critical region bounded by z = -1.645 There is not sufficient evidence to warrant rejection of the claim of no difference It appears that women not have the ability to predict the sex of their babies 17 The test statistic of z = -2.37 is not in the critical region bounded by z = -2.575 and 2.575 There is not sufficient evidence to warrant rejection of the claim that the median is equal to 5.670 g The quarters appear to be minted according to specifications 19 The test statistic of z = -5.32 is in the critical region bounded by z = -1.96 and 1.96 There is sufficient evidence to warrant rejection of the claim that the median amount of Coke is equal to 12 oz Consumers are not being cheated because they are generally getting more than 12 oz of Coke, not less 21 Second approach: The test statistic of z = -4.29 is in the critical region bounded by z = -1.645, so the conclusions are the same as in Example Third approach: The test statistic of z = -2.82 is in the critical region bounded by z = -1.645, so the conclusions are the same as in Example The different approaches can lead to very different results; see the test statistics of -4.61, -4.29, and -2.82 The conclusions are the same in this case, but they could be different in other cases Section 13-3 The sign test converts the sample data to positive and negative signs, whereas the Wilcoxon signed-ranks test uses ranks, so 11 13 15 conclusions will sometimes be different Because the Wilcoxon signed-ranks test uses more information about the data, it is likely to yield better results n = 10 The critical value is T = Test statistic: T = 1.5 Critical value: T = Fail to reject the null hypothesis that the population of differences has a median of Based on the sample data, it appears that when Friday falls on the 13th, the numbers of hospital admissions are not affected Test statistic: T = Critical value: T = 25 Reject the null hypothesis that the population of differences has a median of Based on the sample data, it appears that when awards are presented, there is a difference between the age of the best actress and the best actor Reject the null hypothesis that the population of differences has a median of Based on the sample data, it appears that when awards are presented, there is a difference between the age of the best actress and the best actor Test statistic: T = 13.5 Critical value: T = Fail to reject the null hypothesis that the population of differences has a median of There is not sufficient evidence to support the claim that the time spent depicting the use of alcohol is less than the time spent depicting the use of tobacco Convert T = 196 to the test statistic z = -2.88 Critical values: z = ;2.575 (Tech: P-value = 0.004.) There is sufficient evidence to warrant rejection of the claim that the median is equal to 5.670 g The quarters not appear to be minted according to specifications Convert T = 15.5 to the test statistic z = -4.82 Critical values: z = ;1.96 (Tech: P-value = 0.000.) There is sufficient evidence to warrant rejection of the claim that the median amount of Coke is equal to 12 oz Consumers are not being cheated because they are generally getting more than 12 oz of Coke, not less a 975 n (n + 1) b - k Section 13-4 Because the samples are small and they are from populations not having normal distributions, the parametric t test should not be used a 500.5 b No, we are testing the null hypothesis that the two populations have the same median, which is not specified R1 = 692, R2 = 739, mR = 621, sR = 55.723, test statistic: z = 1.27 Critical values: z = ;1.96 (Tech: P-value = 0.2026.) Fail to reject the null hypothesis that the populations have the same median R1 = 416, R2 = 325, mR = 468, sR = 33.045, test statistic: z = -1.57 Critical values: z = ;1.96 (Tech: P-value = 0.1156.) Fail to reject the null hypothesis that the populations have the same median R1 = 863, R2 = 412, mR = 637.5, sR = 51.539, test statistic: z = 4.38 Critical value: z = 2.33 (Tech: P-value = 0.0000.) Reject the null hypothesis that the populations have the same median There is sufficient evidence to support the claim that the nonfiltered king size cigarettes have a median amount of Appendix D nicotine that is greater than the median amount of nicotine in the 100 mm filtered cigarettes 11 R1 = 487.5, R2 = 142.5, mR = 414, sR = 28.775, test statistic: z = 2.55 Critical value: z = 1.645 (Tech: P-value = 0.0053.) Reject the null hypothesis that the populations have the same median There is sufficient evidence to support the claim that movies with ratings of PG or PG-13 have a higher median gross amount than movies with R ratings 13 z = -0.63 The test statistic is the same value with opposite sign 11 Section 13-5 Because the values from each sample are not matched in any way with values from the other samples, the samples are independent + + + 7.5 + 7.5 + 10 + 12 + 21.5 + 21.5 = 91 Test statistic: H = 6.6305 Critical value: x2 = 5.991 (Tech: P-value = 0.0363.) Reject the null hypothesis of equal medians Interbreeding is suggested by the data Test statistic: H = 1.2239 Critical value: x2 = 5.991 (Tech: P-value = 0.5423.) Fail to reject the null hypothesis of equal medians The available data not provide sufficient evidence to suggest that larger cars are safer Test statistic: H = 20.9247 Critical value: x2 = 5.991 (Tech: P-value: 0.000.) Reject the null hypothesis of equal medians It appears that cars with more cylinders produce larger amounts of greenhouse gases 11 Test statistic: H = 27.9098 Critical value: x2 = 5.991 (Tech: P-value: 0.000.) Reject the null hypothesis of equal medians There is sufficient evidence to warrant rejection of the claim that the three different types of cigarettes have the same median amount of nicotine It appears that the filters make a difference 13 Corrected H is 5.835 (using ©T = 282 and N = 30), which is not very different from the uncorrected H of 5.774 13 15 17 19 837 c ;0.197 d ;0.322 rs = -0.929 Critical values: -0.786, 0.786 Reject the null hypothesis of rs = There is sufficient evidence to support the claim of a correlation between the two judges Examination of the results shows that the first and third judges appear to have opposite rankings rs = -0.007 Critical values: -0.587, 0.587 Fail to reject the null hypothesis of rs = There is not sufficient evidence to support the claim of a correlation between the conviction rates and recidivism rates Conviction rates not appear to be related to recidivism rates rs = 0.664 Critical values: - 0.618, 0.618 Reject the null hypothesis of rs = There is sufficient evidence to support the claim of a correlation between quality score and cost It appears that higher quality is associated with higher cost, so you can expect to get higher quality by purchasing a more expensive LCD TV rs = Critical values: -0.886, 0.886 Reject the null hypothesis of rs = There is sufficient evidence to conclude that there is a relationship between overhead widths of seals from photographs and the weights of the seals rs = 0.231 Critical values: -0.264, 0.264 Fail to reject the null hypothesis of rs = There is not sufficient evidence to conclude that there is a correlation between the numbers of words spoken by men and women who are in couple relationships Using Formula 10-1 for the case of ties among ranks, rs = 0.109 Using the formula for the case of no ties among ranks (even though there are ties), rs = 0.385 There is a substantial difference between the two results The first result is better because it is exact (except for rounding), whereas 0.385 is not exact Using a 0.05 significance level, the critical values are -0.274 and 0.274, so the two results lead to different conclusions Section 13-6 The methods of Section 10-3 should not be used for predictions The regression equation is based on a linear correlation between the two variables, but the methods of this section not require a linear relationship The methods of this section could suggest that there is a correlation with paired data associated by some nonlinear relationship, so the regression equation would not be a suitable model for making predictions The symbol rs is used to represent the rank correlation coefficient computed from the given sample data, and the symbol rs represents the rank correlation coefficient of the paired data for the entire population The subscript s is used so that the rank correlation coefficient can be distinguished from the linear correlation coefficient r The subscript does not represent the standard deviation s It is used to honor Charles Spearman, who introduced the rank correlation method rs = Critical values are -0.886 and 0.886 Reject the null hypothesis of rs = There is sufficient evidence to support a claim of a correlation between distance and time a ;0.521 b ;0.521 Section 13-7 No The runs test can be used to determine whether the sequence of World Series wins by American League teams and National League teams is random, but the runs test does not show whether the proportion of wins by the American League is significantly greater than 0.5 No There might be other problems with the data selection process For example, a voluntary response sample might appear to be random, but it would not be suitable for most statistical purposes n = 9, n = 7, G = 4, critical values: 4, 14 Reject randomness There is sufficient evidence to reject the claim that the genders of the younger winners occur randomly n = 19, n = 14, G = 15, critical values: 11, 23 Fail to reject randomness There is not sufficient evidence to warrant rejection of the claim that we elect Democrats and Republicans in a random sequence n = 20, n = 10, G = 16, critical values: 9, 20 Fail to reject randomness There is not sufficient evidence to reject the claim that the dates before and after July are randomly selected 838 Appendix D 11 n = 23, n = 18, G = 16, mG = 21.1951, sG = 3.11307 Test statistic: z = - 1.67 Critical values: z = ;1.96 (Tech: P-value = 0.0952.) Fail to reject randomness The sequence appears to be random The runs test suggests that the sequence appears to be random, but it does not test for disproportionately more occurrences of one of the two categories, so the runs test does not suggest that either conference is superior 13 The median is 1956, and the value of 1956 is deleted to yield n = 21, n = 21, G = 2, mG = 22, sG = 3.20061 Test statistic: z = -6.25 Critical values: z = ;1.96 (Tech: P-value = 0.0000.) Reject randomness The sequence does not appear to be random when considering values above and below the median There appears to be an upward trend, so the stock market appears to be a profitable investment 15 b The 84 sequences yield runs of 2, runs of 3, 20 runs of 4, 25 runs of 5, 20 runs of 6, and 10 runs of c With P (2 runs) = 2>84, P (3 runs) = 7>84, P (4 runs) = 20>84, P (5 runs) = 25>84, P (6 runs) = 20>84, and P (7 runs) = 10>84, each of the G values of 3, 4, 5, 6, can easily occur by chance, whereas G = is unlikely because P (2 runs) is less than 0.025 The lower critical value of G is therefore 2, and there is no upper critical value that can be equaled or exceeded d Critical value of G = agrees with Table A-10 The table lists as the upper critical value, but it is impossible to get runs using the given elements Chapter 13 Statistical Literacy and Critical Thinking A nonparametric test is a hypothesis test that does not require that sample data come from a population having a normal distribution or any other specific distribution A parametric test is a hypothesis test with a requirement that the sample data are from a population having a particular distribution, and the normal distribution is most common There is no difference They are different names for the same category of hypothesis tests that not require that populations have normal distributions or any other particular distribution A rank is a number assigned to an individual sample item according to its order in the sorted list The first item is assigned a rank of 1, the second item is assigned a rank of 2, and so on The given values have ranks of 1.5, 1.5, 5, 4, and 3, respectively Efficiency is a measure of how much stronger sample evidence must be for the nonparametric test to produce the same results as a corresponding parametric test For example, the sign test has an efficiency of 0.63, meaning that under the same conditions, the sign test requires 100 sample observations to achieve the same results as 63 sample observations analyzed with a corresponding parametric test We should use a nonparametric test when requirements of the corresponding parametric test (such as the requirement of a normally distributed population) are not satisfied Chapter 13 Quick Quiz 2, 1, 3.5, 3.5, The runs test for randomness is used to test the null hypothesis that sample data in a sequence are in a random order This test is based on sample data that have two characteristics Rank correlation can be used in a wider variety of circumstances than the parametric method of linear correlation Rank correlation does not require a normal distribution for any population Rank correlation can be used to detect some (not all) relationships that are not linear When compared to the parametric test, the nonparametric test requires stronger evidence, such as a larger sample or greater differences, in order to reject a null hypothesis Distribution-free test False Four independent samples Two independent samples Matched pairs 10 Four independent samples Chapter 13 Review Exercises The test statistic of z = -1.77 is not less than or equal to the critical value of z = -1.96 Fail to reject the null hypothesis of p = 0.5 There is not sufficient evidence to warrant rejection of the claim that in each World Series, the American League team has a 0.5 probability of winning The test statistic of x = is less than or equal to the critical value of Reject the null hypothesis that the median equals 98.6°F There is sufficient evidence to warrant rejection of the claim that body temperatures have a median equal to 98.6°F The test statistic of T = 1.5 is less than or equal to the critical value of Reject the null hypothesis that the median equals 98.6°F There is sufficient evidence to warrant rejection of the claim that body temperatures have a median equal to 98.6°F n = 8, n = 2, and the number of runs is G = The critical values are and Fail to reject the null hypothesis of randomness There is not sufficient evidence to warrant rejection of the claim that the sequence of odd and even digits is random rs = 0.714 Critical values: -0.738, 0.738 Fail to reject the null hypothesis of rs = There is not sufficient evidence to support the claim that there is a correlation between the student ranks and the magazine ranks When ranking colleges, students and the magazine not appear to agree Test statistic: x = is not less than or equal to the critical value of Fail to reject the null hypothesis that the population of differences has a median of Based on the sample data, it appears that the predictions are reasonably accurate, because there does not appear to be a difference between the actual high temperatures and the predicted high temperatures The test statistic of T = 11.5 is not less than or equal to the critical value of Fail to reject the null hypothesis that the population of differences has a median of Based on the sample data, it appears that the predictions are reasonably accurate, because there does not appear to be a difference between the actual high temperatures and the predicted high temperatures Appendix D Test statistic: H = 0.559 Critical value: x2 = 5.991 (Tech: P-value = 0.756.) Fail to reject the null hypothesis of equal medians The available data not provide sufficient evidence to suggest that larger cars are safer R1 = 273, R2 = 192, mR = 232.5, sR = 24.1091, test statistic: z = 1.68 Critical values: z = ;1.96 (Tech: P-value = 0.0930.) Fail to reject the null hypothesis that the populations have the same median There is not sufficient evidence to support the claim that the region becomes cooler as the fall season progresses 10 n = 17, n = 13, and the number of runs is G = 12 The critical values are 10 and 22 Fail to reject the null hypothesis of randomness There is not sufficient evidence to warrant rejection of the claim that the sequence of odd and even temperatures is random Chapter 13 Cumulative Review Exercises Mean: 213.2 mg; median: 155.5 mg; range: 469.0 mg; standard deviation: 150.1 mg; variance: 22,529.5 mg2 r = 0.689 Critical values: r = ;0.632 P-value = 0.0277 There is sufficient evidence to support the claim of a linear correlation between cholesterol and weight yN = 104 + 0.158x; 120 lb rs = 0.212 Critical values: -0.648, 0.648 Fail to reject the null hypothesis of rs = There is not sufficient evidence to support the claim that there is a correlation between cholesterol and weight Test statistic: F = 8.9785 Critical value of F is between 3.2317 and 3.3158 (Tech: 3.2849) P-value: 0.0008 Reject the null hypothesis of equal population means There is sufficient evidence to warrant rejection of the claim that the three books have the same mean Flesch-Kincaid Grade Level The three books not appear to be written with the same reading level Test statistic: H = 13.218 Critical value: x2 = 5.991 (Tech: P-value = 0.0013.) Reject the null hypothesis of equal medians The available data provide sufficient evidence to warrant rejection of the claim that the three books have the same median Flesch-Kincaid Grade Level The three books not appear to be written with the same reading level H0: m1 = m2 H1: m1 Z m2 Test statistic: t = 1.820 Critical values: t = ;2.201 (Tech: t = ;2.123) P-value 0.05 (Tech: 0.0878) Fail to reject H0 There is not sufficient evidence to reject the claim that the samples of Flesch-Kincaid 839 Grade Level scores from Clancy and Rowling have the same mean R1 = 175.5, R2 = 124.5, mR = 150, sR = 17.32051, test statistic: z = 1.47 Critical values: z = ;1.96 (Tech: P-value = 0.1410.) Fail to reject the null hypothesis that the samples of Flesch-Kincaid Grade Level scores from Clancy and Rowling are from populations having the same median 10 There must be an error, because the rates of 13.7% and 10.6% are not possible with samples of size 100 Chapter 14 Answers Section 14-2 No If the process is within statistical control, we know that the cans are being filled in such a way that the mean and range not vary too much, but being within statistical control indicates nothing about whether the product specifications are being met It is possible to be within statistical control while underfilling all of the cans by a substantial amount Random variation is the type of variation inherent in the process that is not capable of filling every can of Coke with exactly 12 ounces There are different examples of assignable variation, so answers will vary, but one example is the changing amounts of Coke that result from machinery that becomes worn through extensive use a Within statistical control b Not applicable c The variation is too large, so that some cans are overfilled too much while others are underfilled too much a Out of statistical control b There are consecutive points all below the centerline There are also consecutive points all above the centerline c The process does not appear to be behaving as it should There appears to be a shift in the process values, with all of the low values occurring at the beginning x = 319.26, R = 6.69 R chart: LCL = 1.49, UCL = 11.89 x chart: LCL = 317.20, UCL = 321.32 11 The process variation is out of statistical control All three criteria for statistical stability are violated There is a pattern of increasing values that is not random, there are points lying beyond the control limits, and there are points lying below the centerline 13 x = 2781.71, R = 1729.38 R chart: LCL = 0, UCL = 3465.67 x chart: LCL = 1946.42, UCL = 3617.00 840 Appendix D 15 The process mean is within statistical control 17 x = 123.66, R = 0.375 R chart: LCL = 0, UCL = 0.79 x chart: LCL = 123.45, UCL = 123.88 19 There is no pattern suggesting that the process is not within statistical control Based on the runs chart, the process appears to be within statistical control 21 The s chart is very similar to the R chart Because the s chart uses sample standard deviations instead of ranges, it is much more difficult to construct than the R chart The process appears to be within statistical control 11 The process is within statistical control College enrollments are fluctuating between 60% and 68%; ideally, the rate of college enrollments would be much higher 13 The process is not within statistical control, because there are points lying beyond the upper and lower control limits, and there is a nonrandom pattern of proportions that consistently fall, then rise It appears that higher proportions of people vote in years with a presidential election Overall, the proportions of voters are inexcusably low Less than 60% of those eligible actually vote, and that percentage should be substantially higher Section 14-3 No It is possible that the proportions of defects in repeated samplings behave in a way that makes the process within statistical control, but the actual proportions of defects could be very high, such as 90%, so that almost all of the tablets fail to meet the manufacturing specifications The LCL of -0.022 is replaced by 0, because it is impossible to have negative values of proportions UCL = 0.062 Within statistical control The process appears to be out of control because there is a pattern of an upward trend, there is a point lying beyond the upper control limit, and there are consecutive points that are all above the centerline 15 Except for the vertical scale, the control chart is identical to the one included with Example Appendix D 841 Chapter 14 Statistical Literacy and Critical Thinking Statistical process control consists of methods used to monitor data over time and to ensure that the data not go out of statistical control by meeting any of the specific out-of-control criteria being used Due to a wide variety of causes, such as changing employees and worn machinery, it is possible that the manufacturing process could result in far too many defective tablets with too much or too little of the drug atorvastatin If the process is not monitored, the manufacturer Pfizer could be forced to halt production of Lipitor, with the result that Pfizer would lose some or all of the annual $13 billion income from that drug The process could go out of statistical control because of a changing mean, or increased variation, or both No The limits of 9.5 mg and 10.5 mg are specifications that are independent of upper and lower control limits used for x charts and R charts It is possible for the manufacturing process to be within control with almost all tablets containing an unacceptable amount of atorvastatin, such as mg instead of some value near the desired level of 10 mg R = 62; LCL = 4.7; UCL = 119.3 The variation of the process is within statistical control because none of the three out-of-control conditions are met Chapter 14 Quick Quiz Process data are data arranged according to some time sequence They are measurements of a characteristic of goods or services that result from some combination of equipment, people, materials, methods, and conditions Random variation is due to chance, but assignable variation results from causes that can be identified, such as defective machinery or untrained employees There is a pattern, trend, or cycle that is obviously not random There is a point lying outside of the region between the upper and lower control limits There are eight consecutive points all above or all below the centerline An R chart uses ranges to monitor variation, but an x chart uses sample means to monitor the center (mean) of a process No The R chart shows a point lying beyond the upper control limit R = 21.2 In general, a value of R is found by first finding the range for the values within each individual subgroup; the mean of those ranges is the value of R The process mean is within statistical control x = 6.45 In general, a value of x is found by first finding the mean of the values within each individual subgroup; the mean of those subgroup means is the value of x A p chart is a control chart of the proportions of some attribute, such as defective items 10 False x = 256.95; LCL = 230.97; UCL = 282.93 The mean of the process is within statistical control because none of the three out-of-control conditions are met The proportions not appear to be within statistical control because there is a pattern of declining proportions The chart suggests that in recent years, the homicide rate has been declining, which is good, but it has leveled off in the last four years Chapter 14 Review Exercises Based on the run chart from the first 21 axial loads, there does not appear to be a pattern suggesting that the process is out of statistical control The process is out of statistical control, because there is a pattern of increasing variation If left uncorrected, the process will 842 Appendix D eventually result in production runs with very high proportions of defective tablets Chapter 14 Cumulative Review Exercises a r = 0.691 Critical values: r = ±0.632 P-value = 0.0270 There is sufficient evidence to support the claim of a linear correlation between carbon dioxide concentration and temperature b No The presence of a correlation between two variables does not imply that one of the variables is the cause of the other c yN = 9.54 + 0.0135x d The predicted value of 13.47°C is somewhat close to the actual temperature of 13.88°C Because r = 0.691 is not extremely high, predicted values are not likely to be extremely accurate The process is out of statistical control because there are points lying beyond the upper control limit Also, there appears to be a pattern of increasing variation 0.0256 p 0.0414 H0: p = 0.03 H1: p 0.03 Test statistic: z = 0.92 Critical value: z = 1.645 P-value: 0.1794 Fail to reject H0 There is not sufficient evidence to support the claim that the rate of defects is greater than 3% a 1> 256 b 1> 256 c 1> 128 a 15.87% b 4.4 in and 7.6 in No The online poll resulted in a voluntary response sample that represents only those who chose to respond It is likely that those who chose to respond are not representative of the larger population The values appear to be from a population with a distribution that is not normal The presence of 12 zeros among the 30 values causes a histogram to be skewed to the right instead of being bell-shaped A normal quantile plot shows a pattern of points that is not reasonably close to a straight-line pattern x = 66.0 cents; median = 21.5 cents; s = 108.1 cents 10 H0: p = 0.5 H1: p 0.5 Test statistic: z = 1.10 (using pN = 18>30 = 0.6) Critical value: z = 1.645 P-value: 0.1357 (Tech: 0.1367) Fail to reject H0 There is not sufficient evidence to support the claim that most students have some change in their possession

Elementary statistics technology update 11th edition part 3

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan