Chapter 10 Inference for a Single Quantitative Variable Juan Silva/Getty Images Are mean yearly earnings of all students at a university less than $5,000? Thinking about the population of students’ earnings— an inference problem In Chapter 9, we established methods for drawing conclusions about the unknown population proportion, based on a sample proportion, Q for situations where the variable of interest was categorical Now we turn to situations where the single variable of interest is quantitative, as in the above question about earnings In this case, we focus on the mean as the main summary of interest, and we want to infer something about the unknown population mean, based on an observed sample mean Much of what we established for categorical variables still applies The underlying concepts in performing inference for single categorical variables continue to apply when we perform inference for single quantitative variables The mechanics of the inference procedures, on the other hand, require us to summarize and standardize a different type of variable—quantitative instead of categorical The main summaries for quantitative samples and populations were first introduced in Chapter Population mean is m (called “mu”), a parameter Sample mean is x (called “x-bar”), a statistic Population standard deviation is s (called “sigma”), a parameter Sample standard deviation is s, a statistic When we perform inference about means, a different distribution often applies instead of the standard normal (z) distribution with which we are so familiar by now The sample mean x standardizes in some situations to z but in others to a new type of random variable, t We will begin to address the distinction between inference with z and with t in Example 10.4 on page 464, after establishing x to be our point estimate for m 461 462 Chapter 10: Inference for a Single Quantitative Variable First, let’s return to our opening question about students’ earnings, and phrase it in three different ways to parallel the three forms of inference that will be presented EXAMPLE 10.1 Point Estimate, Confidence Interval, and Hypothesis Test Questions Background: In a representative sample of 446 students at a university, mean earnings for the previous year were $3,776 Questions: What is our best guess for the mean earnings of all students at this university the previous year? What interval should contain the mean earnings of all students at this university for the previous year? Is there evidence that the mean of earnings of all students at this university was less than $5,000? Response: We will answer these questions as we develop a theory for performing the three types of inference about an unknown population mean 10.1 Inference for a Mean When Population Standard Deviation Is Known or Sample Size Is Large When we used the rules of probability in Chapter to summarize the behavior of sample mean for random samples of a certain size taken from a population with mean m, we arrived at results concerning center, spread, and shape of the distribution of sample mean As far as the center is concerned, we stated that the distribution of sample mean x has a mean equal to the population mean m EXAMPLE 10.2 Sample Mean as a Point Estimate for the Population Mean Background: The unknown mean of earnings of all students at a university is denoted m Questions: If we take repeated random samples of a given size from the population of all students, where should their sample mean earnings be centered? If we take a single sample, what is our best guess for m? Responses: Some random samples would have a sample mean less than m and others greater than m, but overall the sample means should average out to m Therefore, sample mean earnings is our best guess for unknown population mean earnings m Practice: Try Exercise 10.1(a) on page 477 Section 10.1: Inference for a Mean When Population Standard Deviation Is Known or Sample Size Is Large This example justifies answering the first question of Example 10.1 in the most natural way: Our best guess for the mean earnings of all students at that university would be the mean earnings of the sampled students—$3,776 Probability theory assures us that sample mean x is an unbiased estimator for population mean m as long as our sample is random and earnings are reported accurately Just as we saw for proportions, we must exercise caution whenever we make generalizations from a sample mean to the mean of the larger population EXAMPLE 10.3 When a Sample Mean Is a Poor Estimate for the Population Mean Background: When students come for help in office hours during a given week, a professor asks them how much time they took to complete the previous week’s assignment Their mean completion time was 3.5 hours Question: What is the professor’s best guess for the mean time all of her students took to complete the assignment? Response: In this situation, sample mean is not an unbiased estimator for the population mean because the sampled students were not a representative sample of the larger population of students If the professor were to guess 3.5, it would almost surely be an overestimate, because students coming for help in office hours would tend to take longer to get their homework done There is no “best guess” in this case © Tom Stewart/CORBIS Practice: Try Exercise 10.1(b) on page 477 Students in office hours—a biased sample? Just as we saw in the case of sample proportion as a point estimate in Chapter 9, use of sample mean as a point estimate for population mean is of limited usefulness Because the distribution of sample mean is continuous, there are infinitely many sample means possible, and our point estimate is practically guaranteed to be incorrect Instead of making a single incorrect guess at the unknown population mean, we should either produce an interval that is likely to contain it, or conclude whether or not a proposed value of the population mean is plausible In 463 464 Chapter 10: Inference for a Single Quantitative Variable other words, our goal is to perform inference in the form of confidence intervals or hypothesis tests The key to how well we can close in on the value of population mean lies in the spread of the distribution of sample mean Once we begin to set up confidence intervals or carry out hypothesis tests to close in on the value of m, standard deviation enters in and the process will differ, depending on whether or not the population standard deviation is known To keep things as simple as possible at the beginning, we will assume at first that s is known, in which case the standardized value of sample mean follows a standard normal (z) distribution If s is not known, we must resort to standardizing with sample standard deviation s instead of population standard deviation s, and the standardized value no longer follows a z distribution, but rather what is called a t distribution To clarify the contrast between situations where the standardized sample mean follows a z or t distribution, we consider two situations that are identical except that the population standard deviation is known in the first case and unknown in the second EXAMPLE 10.4 Inference about a Mean When the Population Standard Deviation Is Known Versus Unknown Background: In a sample of 12 students attending a particular community college, the mean travel time to school was found to be 18 minutes Question: For which of these two scenarios would inference be based on z and for which would it be based on t? We want to draw conclusions about the mean travel time of all students at that college; travel time for all students at that college is assumed to have a standard deviation of s ϭ 20 minutes We want to draw conclusions about the mean travel time of all students at that college; travel time for the sample was found to have a standard deviation of s ϭ 20 minutes Response: The first problem would be answered using inference based on z because the population standard deviation is known The second problem would be answered using inference based on t because the population standard deviation is unknown Practice: Try Exercise 10.3 on page 477 A Confidence Interval for the Population Mean Based on z We begin with situations where the population standard deviation is known, as in the first problem in Example 10.4 We will also include situations where the sample is so large that the population standard deviation s can be very closely approximated with the sample standard deviation s First, we see how to set up a range of plausible values for an unknown population mean m, based on the sample mean x After that, we will see how to test a hypothesis to decide whether or not to believe that the population mean m equals some proposed value Because knowledge about an unknown population mean comes from understanding the distribution of sample mean, we should recall the most important results about sample mean obtained in Chapter Section 10.1: Inference for a Mean When Population Standard Deviation Is Known or Sample Size Is Large Reviewing Results for the Distribution of Sample Mean If random samples of size n are taken from a population with mean m and standard deviation s, then the distribution of sample mean x has mean m standard deviation 2n shape approximately normal if sample size n is large enough The claim about the mean of x requires that the sample be representative The claim about the standard deviation requires the population to be at least 10 times the sample size so that samples taken without replacement are roughly independent The claim about the shape holds if n is large enough to offset non-normality in the shape of the underlying population 95% Confidence Intervals with z Calculations are simplified if we seek a 95% confidence interval for the mean In this case, the multiplier is approximately 2, as long as the population standard deviation is known and a z distribution applies Let’s begin with a situation where both the population mean and standard deviation are known, and we use them to construct a probability interval for the sample mean when random samples of a given size are taken from that population EXAMPLE 10.5 A Probability Interval for the Sample Mean Based on a known Population Mean Background: Assume the distribution of IQ to be normal with a mean of 100 and a standard deviation of 15, illustrated in the graph on the left Suppose a random sample of IQs is observed Then the mean IQ for that sample has a mean of 100, a standard deviation of 15> 29 = 15>3 = 5, and a shape that is normal because IQs themselves are normally distributed This distribution is shown in the graph on the right 55 70 85 0.68 0.68 0.95 0.997 100 IQ 0.95 0.997 100 85 90 95 105 110 115 Sample mean IQ for samples of size n = 115 130 145 Question: What does the 95% part of the 68-95-99.7 Rule tell us about sample mean IQ in this situation? Response: The rule tells us the probability is 95% that sample mean IQ in a sample of size comes within standard deviations of population mean IQ, where a standard deviation is points: that is, in the interval (90, 110) Practice: Try Exercise 10.4(b) on page 477 465 466 Chapter 10: Inference for a Single Quantitative Variable A probability interval, such as the one we saw in Example 10.5, paves the way for our construction of a confidence interval In order to make this transition for proportions when the variable of interest was categorical, we observed in Example 9.4 on page 392 that if a friend is within half a mile of your house, then your house is within half a mile of the friend Similarly, if the sample mean falls within a certain distance of the population mean, then the population mean falls within the same distance of the sample mean However, sample mean is a random variable that obeys the laws of probability— the formal study of random behavior The population mean is a fixed parameter (even if its value is unknown) and it does not behave randomly The correct way to shift from probability statements about the sample mean to inference statements about the population mean is to use the word “confidence,” as demonstrated in the next example EXAMPLE 10.6 The Margin of Error in a Confidence Interval for the Mean Background: The distribution of IQ scores is normal with a standard deviation of 15 Question: If we take a random sample of IQs and use the sample mean IQ to set up a 95% confidence interval for the unknown population mean IQ, what would be the margin of error? Response: The standard deviation of sample mean is population standard deviation divided by square root of sample size, or 15> 29 = The margin of error is standard deviations (of sample mean), or 2(5) ϭ 10 Practice: Try Exercise 10.6 on page 478 Now we are almost ready to present an extremely useful formula for constructing a 95% confidence interval for population mean Because the formula requires the sample mean to be normal, you need to refer back to the guidelines established on page 361 of Chapter These guidelines—which are now modified because in reality we can only assess the shape of the sample data, not the population—must be met for all of the procedures presented in this chapter In practice—especially if the sample size is small—you should always check a graph of the sample data to justify use of these methods Guidelines for Approximate Normality of Sample Mean We can assume the shape of the distribution of sample mean for random samples of size n to be approximately normal if a graph of the sample data appears approximately normal; or a graph of the sample data appears fairly symmetric (not necessarily single-peaked) and n is at least 15; or a graph of the sample data appears moderately skewed and n is at least 30 Section 10.1: Inference for a Mean When Population Standard Deviation Is Known or Sample Size Is Large 467 If these guidelines are followed, then we may construct the confidence interval LOOKING BACK 95% Confidence Interval for Population Mean When Population Standard Deviation Is Known An approximate 95% confidence interval for unknown population mean m based on sample mean x from a random sample of size n is estimate Ϯ margin of error ϭ sample mean Ϯ standard deviations of sample mean s = x ; 2n where s is the population standard deviation Here is an illustration of the sample mean, its standard deviation, the margin of error, and the confidence interval: Notice the similarity between this confidence interval and the one for an unknown population proportion presented on page 396: sample proportion Ϯ standard deviations of sample proportion pN (1 - pN ) L pN ; A n 95% confidence interval for population mean Margin of error = standard deviations Standard deviation = σ sn Estimate = sample mean x The above formula can be applied if a confidence interval is to be produced by hand Otherwise, the interval can be requested using software by entering the data values and specifying the population standard deviation EXAMPLE 10.7 Confidence Intervals for a Mean by Hand or with Software Background: A random sample of weights of female college students has been obtained: 110 110 112 120 120 120 125 125 130 130 132 133 134 135 135 135 145 148 159 Assume the standard deviation for weights of all female college students is 20 pounds Questions: How would we check background conditions and use the data to construct a 95% confidence interval for mean weight of all female college students by hand? How would we proceed if we were using software? Responses: First we need to consider the sample size and the shape of the distribution, to make sure that n is large enough to offset any non-normality, so that sample mean would be approximately normal A stemplot can easily be constructed by hand, and we see that the distribution is reasonably normal At any rate, the sample size (19) is large enough to offset the small amount of right skewness that we should expect to see in a distribution of weight values Continued 468 Chapter 10: Inference for a Single Quantitative Variable 11 12 13 14 15 002 00055 00234555 58 To set up the interval, we calculate the sample mean weight, 129.37 Since the population standard deviation is assumed to be 20 and the sample size is 19, our 95% confidence interval is 129.37 ; 12019 = 129.37 ; 9.18 = (120.19, 138.55) If we had software at our disposal, we could start by producing a histogram of the weight values Again, we would conclude that, although the shape is somewhat non-normal, 19 should be a large enough sample size to allow us to proceed Frequency 110 115 120 125 130 135 140 145 Weight (pounds) 150 155 160 We can request a 95% confidence interval for the population mean, after entering the sample of 19 female weights listed above and specifying the population standard deviation s to be 20 One-Sample Z: Weight The assumed sigma ϭ 20 Variable N Mean Weight 19 129.37 StDev 12.82 SE Mean 4.59 95.0% CI ( 120.37, 138.37) The 95% confidence interval for mean weight of all female college students is (120.37, 138.37) It is slightly different from the interval produced by hand because we used the multiplier and the computer uses 1.96 Practice: Try Exercise 10.8(c) on page 478 Many teachers of statistics will agree that variability is the most important concept for students to understand Individuals vary, and samples vary too, but sample mean does not vary as much as individual values This is the reason why statistics is so useful: We can close in on an unknown population parameter by using the corresponding statistic from a random sample Probability theory taught us the exact nature of the variability of sample mean: Its standard deviation is population standard deviation divided by square root of sample size Thus, in a very straightforward way, larger samples lead to statistics that tend to be closer to the unknown parameters that they estimate Section 10.1: Inference for a Mean When Population Standard Deviation Is Known or Sample Size Is Large The following discussion by four students reminds us that this variability can be assessed as the entire width of the confidence interval, or as the margin of error around the sample mean, or as the standard deviation of sample mean The last of these is estimated as 1sn if s is unknown Confidence Interval for a Mean: Width, Margin of Error, Standard Deviation, and Standard Error S uppose the four students have been asked to find the margin of error in the confidence interval for mean female weight, with output as shown in Example 10.7: The assumed sigma ϭ 20 Variable N Mean StDev Weight 19 129.37 12.82 SE Mean 4.59 95.0% CI ( 120.37, 138.37) Adam: “It’s standard deviation, right? So that would be 12.82.” Brittany: “For a 95% confidence interval, the margin of error is twice the standard deviation, which is twice 12.82, or 25.64.” Carlos: “The whole interval from 120 to 138 isn’t even that wide You forgot to divide by the square root of the sample size The standard deviation of sample mean is 12.82 divided by the square root of 19 That’s what you multiply by 2.” Dominique: “That’s still not right Remember we’re supposed to work with population standard deviation sigma, which is 20 The standard deviation of sample mean is 20 divided by square root of 19, which is 4.59 That’s the SE Mean in the output The margin of error is times that, or 9.18 The confidence interval is 129.37 plus or minus 9.18 That comes out to the interval from 120.19 to 138.55, which is just a little different from the output because the exact multiplier is 1.96, not 2.” So far, we have required the population standard deviation s to be known if we wanted to set up a 95% confidence interval for the unknown population mean using the multiplier that comes from the z distribution In fact, if the sample size is reasonably large—say, at least 30—then we can assume that the sample standard deviation s is close enough to s that 1sn is approximately the same as the standard deviation 1n of x Thus, the z multiplier can be used if s is known or if n is large 95% Confidence Interval for m When s Is Unknown But n Is Large If sample size n is fairly large (at least 30), an approximate 95% confidence interval for unknown population mean m based on sample mean x and sample standard deviation s is s x ; 2n 469 Chapter 10: Inference for a Single Quantitative Variable Use of z probabilities simplifies matters when we hand-calculate a confidence interval for the mean based on a sample mean and sample standard deviation from a fairly large sample size However, when we use software, we cannot request a z confidence interval or hypothesis test procedure unless we can specify the population standard deviation EXAMPLE 10.8 Using a Z Multiplier If the Population Standard Deviation Is Unknown but the Sample Size Is Large Background: In a representative sample of 446 students at a university, mean earnings for the previous year was $3,776 The standard deviation for earnings of all students is unknown, but the sample standard deviation is found to be $6,500 Question: Can we use this information to construct a 95% confidence interval for mean earnings of all students at this university for the previous year? Response: First, we note that although the distribution of earnings is extremely skewed, a sample of size 446 is so large that the Central Limit Theorem ensures the distribution of sample mean earnings to have a normal shape 150 Frequency 470 100 50 0 10 20 30 40 50 Earned (in thousands) 60 70 Because n is so large, we can substitute sample standard deviation s ϭ 6,500 for the unknown population standard deviation s and still use the multiplier from the z distribution Our 95% confidence interval for m is 3,776 ; 6,500 2446 = 3,776 ; 616 = (3,160, 4,392) Practice: Try Exercise 10.9(b) on page 478 Now we have two different reasons for checking the sample size n If s is unknown and we want to set up a confidence interval by hand, we make sure n is large enough so that s is close enough to s to allow for use of probabilities based on the z distribution instead of t 696 Solutions to Selected Exercises 6.38 a independent b non-overlapping 7.4 a b because the distribution is skewed, not symmetric 7.5 a 6.39 0.02 is P(A given W) and 1.00 is P(W given A) 6.47 a 0.44 ϫ 0.38 ϩ 0.56 ϫ 0.43 ϭ 0.41 b gender 6.49 a P(Preg) ϭ 0.5, P(not Preg) ϭ 0.5, P(Pos given Preg) ϭ 0.75, P(Pos given not Preg) ϭ 0.25, P(Pos) ϭ P(Preg) ϫ P(Pos given Preg) ϩ P(not Preg) ϫ P(Pos given not Preg) ϭ 0.5(0.75) ϩ 0.5(0.25) ϭ 0.5, P(Preg given Pos) ϭ P(Preg and Pos)/P(Pos) ϭ 0.5(0.75)/0.5 ϭ 0.75 b P(Pos) ϭ P(Preg) ϫ P(Pos given Preg) ϩ P(not Preg) ϫ P(Pos given not Preg) ϭ 0.8(0.75) ϩ 0.2(0.25) ϭ 0.65, P(Preg given Pos) ϭ P(Preg and Pos)/P(Pos) ϭ 0.8(0.75)/0.65 ϭ 0.92 c P(Pos) ϭ P(Preg) ϫ P(Pos given Preg) ϩ P(not Preg) ϫ P(Pos given not Preg) ϭ 0.2(0.75) ϩ 0.8(0.25) ϭ 0.35, P(Preg given Pos) ϭ P(Preg and Pos)/P(Pos) ϭ 0.2(0.75)/0.35 ϭ 0.43 d As the probability of having the condition in question decreases, so does the probability of having the condition, given that one tests positive for the condition Outcome X Probability NNN 0.75 x 0.75 x 0.75 = 0.421875 NND 0.75 x 0.75 x 0.25 = 0.140625 NDN 0.75 x 0.25 x 0.75 = 0.140625 DNN 0.25 x 0.75 x 0.75 = 0.140625 NDD 0.75 x 0.25 x 0.25 = 0.046875 DND 0.25 x 0.75 x 0.25 = 0.046875 DDN 0.25 x 0.25 x 0.75 = 0.046875 DDD 0.25 x 0.25 x 0.25 = 0.015625 6.53 a X Probability Expected Liberal Not Liberal 0.421875 Year 1970 (60 x 100)/200 = 30 (140 x 100)/200 = 70 3(0.140625) = 0.421875 Year 2003 (60 x 100)/200 = 30 (140 x 100)/200 = 70 3(0.046875) = 0.140625 0.015625 b c 6,000 6.55 a 0.361 ϫ 0.316 ϭ 0.114 b higher b skewed right c 1; 7.7 a A mean could be calculated for the second question only, because it involves a quantitative variable b the principle of long-run observed outcomes 7.8 a 0.01 ϩ 0.02 ϩ 0.04 ϭ 0.07 b Nonoverlapping “Or” Rule c The problem arises because of the way the values of X are assessed (self-ratings are biased higher than true ability) d because ratings of friends are not independent (they would probably tend to be similar) e 0.01(0.01) ϭ 0.0001 f 0.34 ϩ 0.34 Ϫ 0.34(0.34) ϭ 0.56 g 0.07/(0.24 ϩ 0.28 ϩ 0.07) ϭ 0.12 Chapter 7.1 a 0, 1, 2, 3, 4, 5, 6, b X Probability 0.20 0.05 0.05 0.05 0.05 0.50 0.05 0.05 a Ϫ (0.02 ϩ 0.04 ϩ 0.34 ϩ 0.24 ϩ 0.28 ϩ 0.07) ϭ 0.01 b 0.30 Probability 7.3 0.20 0.10 0.00 c left-skewed Self-rated ability d 5, Solutions to Selected Exercises 7.9 9 ) + 1(16 ) + 2(16 ) = Weight with 16 , with 16 , and with 16 : 0(16 16 697 = 0.5 7.10 a histogram for bulls below left 0.80 0.60 Probability Probability 0.60 0.40 0.20 0.00 0.40 0.20 0.00 Bulls Cows b The histogram for cows is shown above right; both histograms are skewed right, but the histogram for bulls has a second peak at c bullocks, since 0.28 is greater than 0.06 d 0.66 ϩ 0.04 ϩ 0.28 ϭ 0.98 e 0.78 ϩ 0.11 ϩ 0.06 ϭ 0.95 f No; the Independent “And” Rule does not apply because the probability of owning no cows could be affected by whether or not bullocks are owned—B and C are likely to be dependent g mB ϭ 0(0.66) ϩ 1(0.04) ϩ 2(0.28) ϩ 3(0.02) ϭ 0.66 h mC ϭ 0(0.78) ϩ 1(0.11) ϩ 2(0.06) ϩ 3(0.03) ϩ 4(0.02) ϭ 0.40 i (66(0) ϩ 4(1) ϩ 28(2) ϩ 2(3))/100 ϭ 0.66, which equals the mean j 0, because a family never owns 0.40 cow 7.12 a 0.5 is the average of the numbers and b 20.52(0.5) + 0.52(0.5) = 0.5 mean 0.5 c 0.5 makes sense as the typical distance of the numbers and from their 7.13 a B = 2(0 - 0.66)2(0.66) + (1 - 0.66)2(0.04) + (2 - 0.66)2(0.28) + (3 - 0.66)2(0.02) = 0.95 b mB ϩ C ϭ 0.66 ϩ 0.40 ϭ 1.06 c The formula can be used only if the random variables are independent, but as discussed above, B and C are likely to be dependent 7.15 a We cannot use the 68-95-99.7 Rule to find probabilities because the distribution is not normal b m ϭ 1(0.01) ϩ 2(0.02) ϩ 3(0.04) ϩ 4(0.34) ϩ 5(0.24) ϩ 6(0.28) ϩ 7(0.07) ϭ 4.9 c The typical distance of values from their mean couldn’t be 0.012 or 0.12 (the histogram would have too little spread) or 12.0 (the histogram would have too much spread) Alternatively, we consider that the values range from to d 4.9 ϩ 2(1.2) ϭ 7.3 e P(X Ͼ 7.3) ϭ f The probability of being more than standard deviations above the mean equals 0, not (1 Ϫ 0.95)/2 ϭ 0.025 as it would be if the 68-95-99.7 Rule held The distribution is skewed left, not normal, and not at all smooth because only seven values are possible g When X ϭ 1, 50 Y = 50 (1) - = When X ϭ 7, Y = 50 (7) - 50 = 100 h mϪ50/3 ϩ (50/3)X ϭ Ϫ50/3 ϩ (50/3)(4.9) ϭ 65 i sϪ50/3 ϩ (50/3)X ϭ (50/3)(1.2) ϭ 20 7.18 mean 95(30) + 32 = 86, standard deviation (5) = 7.20 a 0.66 ϩ 0.66 ϭ 1.32 b No, because standard deviations are not additive 7.22 a mean 4.7 ϩ 4.7 ϭ 9.4, standard deviation 21.02 + 1.02 = 1.4 b The mean of the total days worked by both of them in a week can be computed as in part (a) but not standard deviation because days worked by co-workers would not be independent (they could tend to be similar, or to offset one another) 7.24 a The number of children is fixed, so there is a fixed sample size n ϭ There is the same probability of sickle-cell disease each time, p ϭ 0.25 The births are independent of each other, and for each child there are just two possibilities—disease or not b n ϭ and p ϭ 0.25 7.25 a not binomial because there are more than two possible values b not binomial because sample size n is not fixed c X is binomial d not binomial because sampling without replacement from a relatively small population results in dependence of selections 7.27 a when we sample from 100 people b 25 c 0.25 698 Solutions to Selected Exercises 7.28 Depending on socio-economic level of students at a particular high school, there could be fewer or more single-child families A private high school could tend to have students from smaller, wealthier families If the sample is biased, then the distribution of sample proportion would not be centered at population proportion 0.05 7.29 a If X ϭ and n ϭ 100, pN = 3>100 = 0.03 b If pN = 0.01 and n ϭ 200, X ϭ 0.01(200) ϭ c Mean is np ϭ 50(0.02) ϭ and standard deviation is 2np(1 - p) = 250(0.02)(0.98) = 1.0, approximately d On average, we expect to get about married student in our sample, and the number who are married will tend to differ from this mean by about e Mean is p ϭ 0.02 and standard deviation is 2p(1 - p)>n = 20.02(0.98)>50 = 0.02, approximately f Mean is np ϭ 500(0.02) ϭ 10 and standard deviation is 2np(1 - p) = 2500(0.02)(0.98) = 3.1, approximately g Mean is p ϭ 0.02 and standard deviation is 2p(1 - p)>n = 20.02(0.98)>500 = 0.01, approximately h On average, we expect the proportion of married students in our sample to be about 0.02, and the proportion who are married will tend to differ from this by about 0.01 i Sample proportions are closer to 0.02 for the larger samples (500) 7.30 The mean is 0.02 but we cannot report the standard deviation because there is too much dependence; since 100 is not more than 10 times 40, the rule of thumb for approximate independence is not satisfied 7.32 a 7.34 a Centers would both be 0.25 b There would be less spread for samples of 40 c Shape would be closer to normal for samples of 40 7.35 a more normal for the larger sample size, 500 b The distribution of X is approximately normal for samples of 500 but not 50: 500(0.02) and 500(0.98) are both at least 10, but 50(0.02) is less than 10 c The distribution of pN is also very right-skewed because it has the same shape as X 7.36 a mean 0.25 b standard deviation 0.25(1 - 0.25) = 0.07 c shape close to normal 40 because 40(0.25) ϭ 10 and 40(1 Ϫ 0.25) ϭ 30 are both at least 10 7.38 a (2) Sample proportion of women in random samples of 100 is most normal (balanced population, large sample size) and (3) sample proportion of blacks in random samples of 10 is least normal (unbalanced population, small sample size) b (4) Smallest standard deviation is 0.03, for p ϭ 0.10 and n ϭ 100 (sample proportion of blacks in samples of 100); (1) largest standard deviation is 0.16, for p ϭ 0.50 and n ϭ 10 (sample proportion of women in samples of 10) 7.40 the same 7.42 X is discrete and it has an infinite number of possible values: 1, 3, 5, etc 7.43 a continuous quantitative b categorical c discrete quantitative d continuous quantitative e continuous quantitative f continuous quantitative g discrete quantitative 7.45 a No, P(X Ն 2) and P(X Ͼ 2) aren’t necessarily equal because X is discrete b Yes, P(X Ն 2) and P(X Ͼ 2) should be equal because X, the mean of all values, is continuous c Yes, because X is continuous d Yes, because the random variable is continuous 7.47 0.7 0.6 Probability 0.5 0.4 0.3 17 0.2 22 27 32 37 42 Circumference (inches) 0.1 0.0 X b c The one on the bottom is more normal 47 Solutions to Selected Exercises 7.48 a 0.997, because these values are standard deviations on either side of the mean b (1 Ϫ 0.68)/2 ϭ 0.16, since 127 is one standard deviation above the mean c 112 and 132 (within standard deviations of the mean) d 112 (2 standard deviations below the mean) 7.49 a 0.025 b This is somewhat unusual (on the large side) c between 0.0015 and 0.025 because 20 is between 17 and 22 d this is very unusual (extremely small) e 12 inches below the mean, where there are inches in a standard deviation: (Ϫ12)/5 ϭ Ϫ2.4, so it is 2.4 standard deviations below the mean f z ϭ (20 Ϫ 32)/5 ϭ Ϫ2.4 g x ϭ 32 ϩ 0.8(5) ϭ 36 7.51 a b c d z ϭ (117 Ϫ 100)/15 ϭ ϩ1.13; unexceptional z ϭ (144 Ϫ 100)/15 ϭ ϩ2.93; exceptional z ϭ (132 Ϫ 100)/15 ϭ ϩ2.13; borderline z ϭ (129 Ϫ 100)/15 ϭ ϩ1.93; borderline 7.53 a less than 0.025 b more than 0.025 c more than 0.025 d less than 0.025 e more than 0.025 f more than 0.025 699 population e 23 is a count f 0.15 is a proportion g observational study 7.77 a overestimates b underestimates c overestimates d not obvious 7.84 a X1 ϩ X2 represents the thickness of bread for a sandwich constructed from two individual slices, and 2X1 represents the thickness of bread for a sandwich constructed by folding a single slice b There is more of a tendency for extremely thin or thick sandwiches when a sandwich is constructed by folding over a single slice c Mean of 2X1 is 2(0.5) ϭ 1.0 and mean of X1 ϩ X2 is 0.5 ϩ 0.5 ϭ 1.0; they are equal d Standard deviation of 2X1 is 2(0.01) ϭ 0.02 and the standard deviation of X1 ϩ X2 is 20.012 + 0.012 = 0.014 Standard deviation of 2X1 is larger because there is more of a tendency for extremely thin or thick sandwiches when a sandwich is constructed by folding over a single slice 7.87 a categorical (under 23 or not) 7.54 a b c d 7.57 a between Ϫ1 and Ϫ2 because 0.10 is between 0.16 and 0.025 b greater than ϩ3 because 0.001 is less than 0.0015 7.59 a z ϭ (106 Ϫ 122)/5 ϭ Ϫ3.2; probability of this short less than 0.0015 b z ϭ (125 Ϫ 122)/5 ϭ ϩ0.6; probability of this tall between 0.5 and 0.16 c z ϭ (116 Ϫ 122)/5 ϭ Ϫ1.2; probability of this short between 0.16 and 0.025 2.4 5.6 8.8 12 15.2 18.4 21.6 X b 7.61 a Because 0.02 is between 0.025 and 0.0015, the z-score is between ϩ2 and ϩ3, so the height is between 122 ϩ 2(5) ϭ 132 and 122 ϩ 3(5) ϭ 137 centimeters b Because 0.20 is between 0.16 and 0.5, the z-score is between Ϫ1 and 0, so the height is between 122 Ϫ (5) ϭ 117 and 122 Ϫ 0(5) ϭ 122 centimeters 7.63 a z ϭ Ϫ1.8, probability between 0.025 and 0.05 b z ϭ ϩ2.9, probability between 0.005 and c z ϭ ϩ2.22, probability between 0.025 and 0.01 d z ϭ Ϫ2.41, probability between 0.005 and 0.01 7.65 a z ϭ (106 Ϫ 122)/5 ϭ Ϫ3.2; very short b z ϭ (125 Ϫ 122)/5 ϭ ϩ0.6; somewhat tall c z ϭ (116 Ϫ 122)/5 ϭ Ϫ1.2; somewhat short 0.03 0.07 0.11 0.15 0.19 0.23 0.27 Sample proportion c Yes, because most of the area under the curve is to the left of 20 d 7.67 a less than 0.01 b more than 0.01 c less than 0.01 d more than 0.01 7.69 a b c d z ϭ Ϫ2.576, x ϭ 2.45 Ϫ 2.576(0.17) ϭ 2.01 z ϭ ϩ1.645, x ϭ 2.45 ϩ 1.645(0.17) ϭ 2.73 z ϭ Ϫ1.960, x ϭ 2.06 Ϫ 1.960(0.17) ϭ 1.73 z ϭ ϩ2.326, x ϭ 2.06 ϩ 2.326(0.17) ϭ 2.46 7.72 a pie chart b p because it describes the entire population of assaults c not necessarily, because sample proportion pN varies d closer to 0.20 because larger samples behave more like the 11.4 16.0 20.6 25.2 29.8 34.4 39.0 X e No, because there is almost no area under the curve to the left of 0.05 700 Solutions to Selected Exercises 7.89 a mean 25(0.5) ϭ 12.5, standard deviation 225(0.5)(0.5) = 2.5 b z ϭ (11 Ϫ 12.5)/2.5 ϭ Ϫ0.60 c a bit low d mean 0.5, standard deviation 20.5(0.5)>25 = 0.1 e z ϭ (0.44 Ϫ 0.50)/0.1 ϭ Ϫ0.60 f 0.6 7.91 a between 0.84 and 0.975 b between 0.975 and 0.9985 c between 0.16 and 0.50 d between 0.975 and 0.9985 e between 0.84 and 0.975 f between and 0.0015 8.15 a population mean m b 20 8.17 a population mean b population standard deviation and sample size c population shape and sample size 8.19 a Both distributions are centered at 10.5 b There should be less spread for the means of 16 selections c Shape is more normal for 16 than for 8.21 a to the right of 69 the left of 69 8.23 a 2.8> 216 = 0.7 b at 69 c at 69 d to b 2.8> 249 = 0.4 8.25 a extremely right-skewed b somewhat rightskewed c approximately normal Chapter 8.1 a population proportion p, a parameter b population mean m, a parameter 8.2 a sample mean x, a statistic proportion p, a parameter 8.5 a both distributions centered at 0.66 b less spread for samples of 50 students c shape more normal for samples of 50 students 8.6 a cannot use normal approximation because sample is too small b sample proportion not centered at 0.66 because sample is biased; premeds may tend to have a different graduation rate c can use methods presented 8.7 c 20 b population - 0.66) a standard deviation 40.66(1 90 = 0.05, sample proportion 64/90 ϭ 0.71, standardized sample proportion z ϭ (0.71 Ϫ 0.66)/0.05 ϭ ϩ1, probability approximately 0.16 8.11 a z ϭ (0.93 Ϫ 0.88)/0.02 ϭ ϩ2.5 b between 0.025 and 0.0015 c between 0.01 and 0.005 d very unlikely e (2) f not improbable 8.13 a mean 0.80, standard deviation 20.80(1 - 0.80)>64 = 0.05 b The shape is approximately normal because 64(0.8) and 64(0.2) are both at least 10 c 0.65 0.70 0.75 0.80 0.85 0.90 0.95 Sample proportion believing in God for samples of size 64 d No, because 48/64 ϭ 0.75 is just standard deviation below the mean e 0.80 is a parameter f p g 0.75 is a statistic h pN 8.27 a more than 0.5 b less than 0.16 c more than 0.025 8.28 z ϭ (50 Ϫ 42.6)/3.0 ϭ ϩ2.47, which is between ϩ2.326 and ϩ2.576, so the probability is between 0.01 and 0.005 8.32 a z = (212 - 217)>(126> 23,115) = -2.21 b between 0.025 and 0.0015 c between 0.01 and 0.025 d very unlikely e (2) f z = (212 - 217)>(126> 29) = -0.12 close to zero, so the sample mean is fairly close to 217 8.35 a two categorical variables b one each, quantitative and categorical c one quantitative variable d one categorical variable 8.36 (c) 8.40 The sample is too large relative to population size (should be no more than one-tenth population size, and it is half) and so samples without replacement have too much dependence 8.45 a approximately normal b left-skewed/low outliers c right-skewed/high outliers d rightskewed/high outliers 8.47 4, 3, 2, 1, respectively 8.49 40,000 because 40,000(1/4,000) ϭ 10 8.51 a 0.40 b 20.40(1 - 0.40)>100 = 0.05 c positive d 0.10 e 20.10(1 - 0.10)>100 = 0.03 f negative g z ϭ (0.25 Ϫ 0.40)/0.05 ϭ Ϫ3, unusually low z ϭ (0.25 Ϫ 0.10)/0.03 ϭ ϩ5, unusually high z ϭ (0.46 Ϫ 0.40)/0.05 ϭ ϩ1.2, not unusual z ϭ (0.16 Ϫ 0.10)/0.03 ϭ ϩ2, unusually high Solutions to Selected Exercises 8.53 a categorical; consider the numbers of bedrooms in new single-family houses completed in 2003 and summarize with mean b close to symmetric 701 8.65 a Proportion living on campus is 222/445 ϭ 0.499, or 0.50 rounded to two decimal places 0.50 0.30 Frequency Probability 0.40 0.20 0.10 0.00 Sample proportion with exactly bedrooms in samples of size 0 Sample proportion on campus in samples of size c 0.51 d 20.51(1 - 0.51)>6 = 0.20 e symmetric but not normal because the sample size is relatively small: np ϭ 6(0.51) ϭ 3.06 is less 8.54 0.40 negative (less than 0.51), 0.53 positive (greater than 0.51), 0.90 positive (greater than 0.51) 8.55 0.90 most extreme (farthest from 0.51), 0.53 least extreme (closest to 0.51) 8.56 a pN = 12>30 = 0.40, z ϭ (0.40 Ϫ 0.51)/0.09 ϭ Ϫ1.22, not unusual b pN = 16>30 = 0.53, z ϭ (0.53 Ϫ 0.51)/0.09 ϭ ϩ0.22, not unusual c pN = 27>30 = 0.90, z ϭ (0.90 Ϫ 0.51)/0.09 ϭ ϩ4.33, unusually many 8.57 a 0.51 plus or minus 2(0.09): between 0.33 and 0.69 b No, because 0.70 is outside the range of sample proportions we would see 95% of the time c 0.32 8.64 a mean 0.44, standard deviation 0.44(1 - 0.44) = 0.06, shape approximately 66 normal because 66(0.44) and 66(1 Ϫ 0.44) are both at least 10 b A sample proportion of 0.42 is not at all unusual because it is just a third of a standard deviation below the mean b mean 0.5, standard deviation 20.5(1 - 0.5)>10 = 0.16 Shape is symmetric and somewhat normal because although the sample size is small, the underlying distribution is nicely balanced c mean 0.47, standard deviation 0.13 Frequency than 10 f 0.51 g 20.51(1 - 0.51)>24 = 0.10 h approximately normal because the sample size is relatively large: np ϭ 24(0.51) ϭ 12.24 and n(1 Ϫ p) ϭ 24(0.49) ϭ 11.76 are both at least 10 0.2 0.3 0.4 0.5 0.6 0.7 Sample proportion on campus in samples of size 10 d The mean and standard deviation conform fairly well, and the shape is fairly normal 8.71 a population mean m b 40 negative, 50 positive, 38 negative, 37 negative c 50 most extreme (farthest from 42.6) and 40 least extreme (closest to 42.6) 702 Solutions to Selected Exercises 8.72 a 0, 1, 2, 3, 4, 5, 6, b c left-skewed d 200 (larger sample because it has less spread) e (1) extremely improbable (2) very improbable (3) not improbable f compare means 8.75 a mean 93, standard deviation 15> 2144 = 1.25 b mean 93, standard deviation 15> 236 = 2.5 c sample mean MQ greater than 93 d z = (96.5 - 93)>1.25 = +2.8, which is high enough to suggest that these children are not representative in terms of MQ: theirs tend to be higher than usual e (85.7 Ϫ 93)/2.5 ϭ Ϫ2.92, which is low enough to suggest that these children are not representative in terms of MQ: theirs tend to be lower than usual 8.76 a 15> 225 = b 93 plus or minus 2(3): the interval is (87, 99) c 0.32 Frequency 8.81 a Mean is 610, standard deviation is 72> 240 = 11.4, shape should be close to normal because the population is normal choosing a book; for 7% the cover was the predominant reason d (1) This interval has a 95% probability of containing the proportion of all people for whom a book’s cover was the deciding factor in choosing the last book they read 9.6 Those who come forward may be more likely, or less likely, to be male than abuse victims in general 9.7 population proportion, 0.10 9.8 a 0.60 Ϯ 2(0.035) ϭ (0.53, 0.67) b They should check if their sample was representative of all recent U.S college graduates 9.9 a 20.10(1 - 0.10)>708 = 0.01 b 68, 640 c 0.10 Ϯ 2(0.01) ϭ (0.08, 0.12) d only values greater than 0.05 e Yes, because the range of plausible values is strictly higher than 0.05 f Yes, because the population size would be much more than 10 times 708 9.11 0.20 ; 220.20(1 - 0.20)>504 = 0.20 ; 0.04 = (0.16, 0.24) or 0.20 ; 1.9620.20(1 - 0.20)>504 = 630 625 620 615 610 605 600 595 590 585 X = Mean Math SAT score in samples of size 40 b mean 610.5, standard deviation 11.5 c Means and standard deviations are very close; shape is bell-shaped but has a low outlier d If we continued to take repeated random samples, some sample means would be on the high side to balance out those that are on the low side Chapter 9.4 a 0.53 b It doesn’t make sense to set up a 95% confidence interval for the proportion of all Topeka voters who opposed the repeal because the data actually already represent all voters from that election c p 9.5 a The survey from 1999 allowed for overlapping categories because they total to more than 100% The survey from 2004 apparently did not because readers specified the main reason for choosing the last book they had read b 0.66 c The difference can be attributed to the questions: For 66% the cover was one of several reasons for 0.20 ; 0.03 = (0.17, 0.23) 9.14 Using actual binomial probabilities would be preferable; because there are only two nonsuccesses, a normal approximation would not be appropriate 9.17 a the first survey, because all the interval’s values are greater than 0.5 b narrower c n ϭ 1/0.042 ϭ 625 9.18 a A 90% confidence interval would be narrower because the multiplier is 1.645 instead of b The point estimate for population proportion is halfway between 0.67 and 0.91: (0.67 ϩ 0.91)/2 ϭ 0.79 c The margin of error is the distance from 0.79 to 0.67 or to 0.91, which is 0.12 d The approximate standard deviation is half the margin of error, or 0.06 e (3) We are 95% sure that population proportion falls in this interval f Yes, because the population (several thousand) is at least 10 times the sample size (75) 9.19 a 1/216,000 = 0.008 b Yes, because 0.52 Ϫ 0.008 is still greater than 0.5 c 1/28,000 = 0.011 d 1/2.13(16,000) = 0.022 e 0.53 is within margin of error (0.022) of 0.52, and so it is a plausible value and the rate for the population of Hispanics may not have dropped at all f larger samples 9.21 (d) 9.29 Actual binomial probabilities should be used because there were too few in one of the categories to justify use of a normal approximation 9.35 100(0.95) ϭ 95 Solutions to Selected Exercises 9.36 a One point of view is that population proportion equals 0.05 and the other is that it is higher than 0.05 b One would hope that researchers were careful to obtain a representative sample of children and to perform the strep test carefully c No: Certainly there would be more than 10 times 708 children in the city d No: The sample size is large enough because we would expect 0.05(708) ϭ 35 with and 0.95(708) ϭ 673 without; both numbers are greater than 10 e pN = 68>708 = 0.10, z ϭ (0.10 Ϫ 0.05)/ 0.01 ϭ ϩ5 f less than 0.0015 because ϩ5 is greater than ϩ3 g (1) If researchers suspected that resistance rates were going to be higher in a particular city, the data would be more convincing than if they had just gone “fishing” for unusual values 9.37 a greater-than b H0 : p ϭ 0.5 vs Ha : p Ͼ 0.5 9.38 a shaded area shown in first graph below 703 d shaded area shown in graph above e No, because neither of the P-values is small f Yes, because the null hypothesis that p ϭ 0.5 is not rejected 9.40 a 2,858/5,776 ϭ 0.495 b pN c Yes, because it is less than 0.5 d greater than 0.16 because Ϫ0.79 is to the right of Ϫ1 e z 9.41 a z ϭ because pN and p0 are both 0.5 b The entire half of the normal curve to the right of 0.5 is shaded; the P-value is 0.5 c The entire half of the z curve to the right of is shaded d The entire area under the curve is shaded e The entire area under the curve is shaded; the P-value is 1.0 f No, because the P-value is not small at all in either case 9.46 a z is more extreme for the House because the sample size is larger b The P-value for the House is smaller because z is more extreme c 0.040/2 ϭ 0.020 d one-sided only, because 0.02 is less than 0.025 but 0.04 is not e 0.13 9.47 a 0.11 (it is the smallest) b 0.20 (it is the largest) c 0.20 (it is the farthest from 0.15) d 0.14 (it is the closest to 0.15) 9.49 We have very strong evidence that this county’s proportion of Caesarians does not conform to the national rate of 0.26 0.50 0.52 ∧ Sample proportion p 9.50 not very small 9.52 a Type I (rejecting null hypothesis that the draws are random, even though it is true) b Type II (failing to reject null hypothesis that the draws are random, even though it is false) c Type I, because they would invest time, money, and effort into improving the lottery when it’s actually fine 0.85 b shaded area shown in graph above area shown in graph below z c shaded 0.50 0.52 ∧ Sample proportion p 9.56 a 0.05 is usually the default cutoff level a b 0.01 should be used so that an athlete is not unfairly barred from participation in his sport c 0.10 should be used because no immediate negative consequences would result from a positive test 9.59 a 11.04 is the z statistic for the test making a comparison to all Americans aged 18 to 44, because 0.955 is further from 0.855 than it is from 0.942 b Yes, the P-value is approximately zero, ecause the z-statistic is so high 9.65 a The interval wouldn’t come close to containing 0.13 b H0 : p ϭ 0.06 and H0 : p ϭ 0.14 c p0 d z ϭ if pN = p0 e P-value would be 1, twice the probability of z being greater than the absolute value of 9.66 d The data provide evidence that Group A strep in the city is more resistant to macrolides than it is in the United States in general 0.85 z 9.67 b The data fail to prove that women are more likely to die in the week after Christmas than in the week before 704 Solutions to Selected Exercises 9.73 P-value is 0.2148 9.76 P-value is 0.0154 9.77 P-value is 0.1977 9.96 z ϭ ϩ5 goes “off the charts,” so tables would not be helpful; we could say the P-value is approximately zero 9.98 0.0197 10.18 a H0 : m ϭ 1.0 versus Ha : m Ͼ 1.0 - 1.0 b t = 1.4 1.35> 10 = +0.94 c The standardized sample mean is identified as t because it is calculated using s and not s (and the sample size is small) d right-tailed e not small at all f The data not provide evidence that mean number of calves sired by all male beluga whales in captivity exceeds 1.0 10.20 The shape of the distribution of ages is close enough to normal but that of weights is not, because of right skewness and high outliers, and the fact that the sample size is rather small Chapter 10 10.1 a 6.5 b No, because the sample would not be representative of all Iberian rock lizards 10.3 We know s, so the standardized statistic is z 10.4 a m and s b 504 ; 1110 121 = (484, 524) c We already know the population mean to be 504 d Narrowest is (4) e Widest is (1) 10.22 a H0 : m ϭ 44 versus Ha : m Ͻ 44 b to justify calling the standardized statistic “t” (otherwise, because of the small sample size, the distribution would not be symmetric and bell-shaped) c less than 44 because t is negative d 0.002 is small e The average price of Merlot wines is less than the company’s wine prices in general f 2(0.002) ϭ 0.004 10.6 2(20> 216) = 10 10.23 a B b B c B 10.8 a t procedure b z procedure c (11.08, 12.15) d sample mean 11.614, P-value 0.000 e Selections were apparently not truly random because the P-value is very small f Yes, because the P-value would have been half the size of 0.000 g No, because sample mean is actually greater than 10.5 h Because the sample mean was significantly greater than 10.5, people apparently perceive larger numbers to be more random i larger j smaller 10.26 a 0.05 b 19; c one (0.020) d It will not always be the case that exactly one P-value is small enough; sometimes none will be small enough, sometimes maybe two e The interval will not contain 610.44 f because about half of the scores are below the mean and the other half above 10.9 a $9.04 b because the sample size (n ϭ 82) is fairly large c 9.04 ; 2(1.28> 282) = (8.76, 9.32) dollars d Yes: $9.00 is wellcontained in the confidence interval e narrower f narrower g Ha : m 9.00 10.11 a H0 : m ϭ 22.6 versus Ha : m Ͼ 22.6 b because is the population standard deviation c z = 25.35>-1622.6 = +1.32 d not small because z is closer to than to e no 2.64 would be significant f 2(1.32) ϭ 10.13 a The P-value is quite small (0.007), so we reject the null hypothesis and conclude that m Ͼ 7.151 for the 20th century b Greater longevity in the latter century is a possible explanation for longer tenures c 2(0.007) ϭ 0.014 10.27 a x is approximately normal because the sample is large (192) b Standardized sample mean follows an approximate z distribution because for a large sample, s is close to s c 57 ; 2(16> 2192) = 57 ; 2.3 = (54.7, 59.3) d t e 40 is not a plausible value for population mean because it is below the interval, not inside 57 - 40 = 14.72 g 40 is not a plausible value f 16> 1192 for population mean because then standardized sample mean would be too large to be believable h Ha : m 40 i Yes, because 40 is not contained in the confidence interval 10.30 a We have strong evidence that all male workers average more than 40 hours a week b No, because according to the test 40 is not a plausible value for population mean 10.33 a 10.14 because the standard deviation (1.362) is clearly more than (84.4, 96.4) words per minute b 1.645 c narrower d narrower e wider Frequency 10.16 a 90.4 ; 1.86(9.7> 29) = 90.4 ; 6.0 = 4 10 11 12 Goat scores (control) Solutions to Selected Exercises 705 10.60 a b H0 : m ϭ versus Ha : m Ͼ c A formal test is not necessary to establish that the goats did not perform significantly better than chance because their sample mean (8.8) was worse than chance d No We can say that we have no compelling evidence that goats in general perform better than chance with the given social cues e 8.8 ; 2(1.8> 223) = 8.8 ; 0.75 = (8.05, 9.55) 05 257 225 057 b (55.07, 71.09) (narrower) - 50 c t = 63 15> 112 = 3.00 d Yes, because the probability of t being at least as high as 3.00 is less than 0.01 e Ha : m Ͼ 50 f paired 10.37 a negative b positive c positive d negative e negative f negative g positive 10.71 a 0.0089 b (3) 10.73 a Age at first conception would also be right-skewed for human males because most would occur between the ages of 20 and 30, a few within a few years below 20, but some as much as 20 or 30 years more than 30 b Variable BelugaAge N Mean 13.00 StDev 2.71 SE Mean 1.02 95.0% CI ( 10.49, 15.51) c Ha : m Ͻ 12 10.82 a (51.89, 65.01) b 134 minutes a day is not plausible for students’ population mean; they apparently watch much less c H0 : m ϭ 134 versus Ha : m 134 group of similar students who did not participate j because it is roughly normal Chapter 11 11.1 a hypothesis test to determine that conditions of stress affect speech rates of stutterers in general 11.2 a two-sample b There is no relationship between age (adult or juvenile) and bucketselection performance; or, population mean bucket-selection performance scores are the same for adults and juveniles Symbolically, the latter can be written as H0 : m1 ϭ m2 11.4 a paired b by reporting the standard deviation of the differences between normal and stressed speech rates c histogram of differences d inference about md based on xd 11.6 a paired b (2) c No, a histogram should be used to display the single sample of differences 16.69 - = 4.51 d md ϭ e 14.81> 16 f yes g 16.69 ; 2.13114.81 ϭ (8.79, 24.58) 16 h Students might better when they’re older, or they might better second time around because of having thought about the questions during the interim period i A better design would be to compare improvements for students who 11.9 In each case, we see if the t statistic exceeds a yes b yes c no d yes e no 11.13 a 25.3 Ϫ 17.57 ϭ ϩ7.73 b Those who didn’t study did better c ϩ7.73/3.33 ϭ ϩ2.32 d No, because the P-value is not less than a ϭ 0.05 e Yes, because the P-value is less than a ϭ 0.05 f larger g Brighter students may exaggerate how little they need to study, by way of bragging, and struggling students may exaggerate how much time they study, by way of complaining Students could simply be asked to report hours studied in the previous week, or during any typical week, or they may be given journals for one or more weeks to record study hours Either design has disadvantages: Retrospective is subject to students’ faulty memories, and prospective may influence students’ behaviors Anonymity is important because students may over-report hours studied if they think their professors will have access to the information 11.14 a t statistic would be larger for the difference between ulna lengths because those sample means are farther apart b t statistic would be larger for the difference between ulna lengths because that standard deviation is smaller c 4.0 is for femurs and 7.3 is for ulnas d both, because both t statistics are unusually large e larger 706 Solutions to Selected Exercises 2 11.18 (0.16 - 0.09) ; 0.11 + 0.06 = A 22 20 0.07 ; 0.05 = (0.02, 0.12) 11.19 a 3.2 Ϫ 2.6 ϭ 0.6 b 0.6(60) ϭ 36 minutes c large sample sizes d very strong evidence that population means are different e only positive numbers f two categorical variables 11.20 a Both, because the P-values, 0.021 and 0.024, are both fairly small b Yes, because 0.11 is not more than twice 0.06 c yes d 0.012 Lead concentration (mg per liter) 11.24 a No; the categorical explanatory variable is which company and the quantitative response variable is lead concentration There is a column of responses for each categorical group b I ϭ 3, N ϭ ϩ ϩ ϭ 14 c 11.36 a Physical complaint is categorical b GSI score is quantitative c m1 ϭ m2 ϭ m3 ϭ m4 ϭ m5 ϭ m6 ϭ m7 d Not all population means are the same e numerator I Ϫ ϭ Ϫ ϭ 6, denominator N Ϫ I ϭ 76 Ϫ ϭ 69 f no g If a Type II Error is made, then no attempt is made to refer the adolescents for psychiatric screening on the basis of their physical complaint; perhaps such screening would have benefitted the adolescents 11.37 a Cancun cheapest and Los Cabos most expensive b no c Not all the population means are equal d no e No, they provide evidence that not all four are the same f Los Cabos 11.38 The largest sample standard deviation (104.7) is more than twice the smallest (40.3) 11.44 a H0 : m1 ϭ m2 b H0 : m1 ϭ m2 ϭ m3 c H0 : md ϭ 60 50 11.58 a paired b not large c no d The confidence interval contains zero, so we would not reject the null hypothesis claiming the population mean of differences to be zero e xd f width of interval is 38.8 Ϫ (Ϫ25.2) ϭ 64; halfway between endpoints is Ϫ25.2 ϩ 32 ϭ 6.8 g Math 40 30 20 10 IBM Memorex Zenith d Zenith e Memorex f Memorex g top-heavy h 21.36, 6.98, 21.91 i The sample standard deviations not satisfy the rule because 21.91 is more than twice 6.98 j not small k no 11.26 a t2 ϭ 2.352 ϭ 5.52 b It is very close to F ϭ 5.53 11.65 a One group of children has had cancer, another independent group has not b m1 Ͻ m2 c because x1 x2 11.77 a quite small b yes c quite large d no 11.82 a Means and standard deviations are Rainbow 1.015 and 0.446, Sucker 2.810 (highest mean) and 1.176 (highest sd), Whitefish 0.6050 and 0.0636 b The sample standard deviations are too different 11.31 gender 11.85 a Gaze 9.00 and control 9.31 A test is not necessary because the sample actually performed worse when cued with a gaze b Mean of differences is 2.20, t ϭ 2.58, P-value ϭ 0.015 (one-sided) The juveniles did significantly better with gaze than no cue (control) c If the mean of differences in part (b) had been negative, there would have been no evidence of performing better when cued with a gaze d Testing H0 : m ϭ versus Ha : m Ͼ has t ϭ 3.28, P-value ϭ 0.005 The null hypothesis is rejected and we conclude that, in general, juvenile goats respond to gazing, which is consistent with our conclusion in part (b) 11.34 Individuals in certain age groups could be more or less likely to admit to having been involved in automobile accidents than those in other age groups 11.89 The two-sample confidence interval (Ϫ6.522, Ϫ3.350) contains only negative numbers, suggesting times are shorter for freestyle overall on average 11.27 a all three b all three c none 11.29 a elementary/middle schools b middle schools c because their sample size (6) was smallest d I ϭ and N ϭ 49 ϩ ϩ 17 ϩ 12 ϭ 84 e I Ϫ ϭ Ϫ ϭ are the group degrees of freedom and N Ϫ I ϭ (49 ϩ ϩ 17 ϩ 12) Ϫ ϭ 84 Ϫ ϭ 80 are the error degrees of freedom f MSG ϭ SSG/DFG ϭ 9,220/3 ϭ 3,073, MSE ϭ SSE/DFE ϭ 585,561/80 ϭ 7,320 g F ϭ 3,073/7,320 ϭ 0.42 h no i larger j 1,250 707 Solutions to Selected Exercises 11.91 a alternative b Mean of sampled differences is 14.75, standardized mean is t ϭ 3.46, P-value is 0.001 c yes d (6.36, 23.13) e 10, 20 f because certain universities would attract students with varying relative strengths or weaknesses in math g two quantitative 11.96 a P-value is 0.002 so the difference is significant b those without pierced ears c gender d First separate males and females, then two two-sample tests 11.97 (Ϫ6.671, Ϫ5.431) Yes, because inches is contained in the interval 12.17 a pN and pN b Yes, because chi-square would be 8.4161 c ANOVA 12.24 a men b (2) and (3) 12.28 a cell phones b gender and carrying cell phones c gender and carrying cell phones 12.30 a 0.182,0.164 b X1 and X2 c a bit smaller than 0.05 d Yes, because the P-value is on the small side e smaller than 3.84 f Type II g Type I 12.31 a Taking Celebrex or not is explanatory and having heart problems or not is response b less than 0.05 c greater than 3.84 Chapter 12 12.2 a religion b 0.84 c 0.406 12.4 H0 : p1 ϭ p2 versus Ha : p1Ͼp2; z = requests for medications and what, if anything, the doctor prescribes m because the expected counts would be too small (between and 2) (0.83 - 0.72) - 1 0.79(1 - 0.79) a + b A 282 162 = 2.74; because z is greater than 2.576, the P-value is less than 0.005 and we conclude the difference is significant 12.5 a large b small c yes 12.8 The interval comes very close to containing zero, and the P-value is also somewhat borderline, just under 0.05 12.10 a Ϫ2.55 b 6.492 c Ϫ4.27 d 18.226 12.12 a for those requesting Paxil b Paxil request: 14/51 ϭ 0.27; general request: 1/50 ϭ 0.02; no request: 2/48 ϭ 0.04 c The proportion seems very different for those requesting Paxil d 5.82 ϭ 17(51)/149; 5.70 ϭ 17(50)/149; 5.48 ϭ 17(48)/149; 45.18 ϭ 132(51)/149; 44.30 ϭ 132(50)/149; 42.52 ϭ 132(48)/149 e 5.82/51 ϭ 5.70/50 ϭ 5.48/48 ϭ 0.114 f for those requesting Paxil g 11.503 ϭ (14 Ϫ 5.82)2/5.82; 1.481 ϭ (37 Ϫ 45.18)2/45.18; 3.880 ϭ (1 Ϫ 5.70)2/5.70; 0.500 ϭ (49 Ϫ 44.30)2/44.30; 2.207 ϭ (2 Ϫ 5.48)2/5.48; 0.284 ϭ (46 Ϫ 42.52)2/42.52 h “patients” requesting Paxil and receiving Paxil i 19.855 ϭ 11.503 ϩ 1.481 ϩ 3.880 ϩ 0.500 ϩ 2.207 ϩ 0.284 j There are r ϭ possibilities for the row variable and c ϭ possibilities for the column variable; the degrees of freedom are (r Ϫ 1)(c Ϫ 1) ϭ 2(1) ϭ k large, because it is considerably greater than 6.0 l The size of chi-square suggests that, in general, there is a relationship between a patient’s 12.41 a When the three individual groups are compared the chi-square statistic is fairly large, the P-value is fairly small, and there is evidence at the 0.05 level of a relationship between treatment and nausea In contrast, when the acupuncture and placebo groups are combined, the chi-square statistic is not large, the P-value is not small, and there is no evidence of a relationship b Report those on the left, because apparently acupuncture versus placebo makes a difference, and these groups should not be combined c Both, because all counts are at least 12.42 a (given the two counts of 19, there is enough information to fill in the rest of the table) b One Kind Two Kinds Three Kinds Total Sequential choice 19 19 41 Simultaneous choice 10 21 25 56 Total 29 40 28 97 Observed c more often d less often people tend to buy in bulk e yes f one that 12.44 a sample proportions 389/885 ϭ 0.44, 248/2,226 ϭ 0.11, z ϭ 20.46, P-value ϭ 0.000, conclude drug users are more likely to carry weapons than non-users b sample proportions 389/637 ϭ 0.61, 496/2,474 ϭ 0.20, z ϭ 20.46, P-value ϭ 0.000, conclude those who carry weapons are more likely to use drugs c chisquare ϭ 418.737, P-value ϭ 0.000, conclude taking drugs or not and carrying weapons or not are related d both Solutions to Selected Exercises 12.49 a chi-square ϭ 10.161, P-value ϭ 0.001; females were more likely to eat breakfast b chi-square ϭ 7.428, P-value ϭ 0.006, females were more likely to carry a cell phone c Those who did carry a cell phone were more likely to eat breakfast: 0.56 of those who carried a cell phone ate breakfast; 0.54 of those who didn’t carry a cell phone ate breakfast; the P-value is 0.741 so the relationship is not significant 13.1 for departures versus gates, on the left (there is less scatter and s will be smaller) 13.2 a because longer trips cost more money b No, it is doubtful that representatives would base the length of their trip on how much money is paid c Ϫ683 ϩ (1,176)(1) ϭ 493 dollars d $3,401 e s f 1,176 dollars g b1 h The intercept Ϫ683 should be interpreted theoretically as the y-value where the line crosses the y-axis; in actuality there would be no trips of length zero i b0 j b0 k all of these l moderately strong 13.6 (d) 13.8 a in the sample b 39 Ϫ ϭ 37 d 0.000 e (2) and (4) 13.9 Ha : b1 Ͼ because we expect the relationship to be positive c both 13.10 weak evidence of a strong relationship 13.12 strong evidence of a weak relationship 13.13 a Ϫ ϭ b (2) because r is of moderate size but the P-value is not small at all c d 0.23/2 ϭ 0.115 e chi-square 13.15 a Because children grow taller as they get older, we expect a positive relationship: correlation r greater than zero Age tells us quite a lot—but not everything—about how tall a child will be, so we expect r to be somewhere between 0.5 and 1.0 This should be true for the larger population of children, not just the sample b (2.0, 2.5) c b1 large d weak evidence of a strong relationship 13.18 a small b confidence interval c 55.3 Ϯ 2(1.2) ϭ (52.9, 57.7) d Yes, it came quite close e 55.3 ; 2(1.2)> 225 = (54.82, 55.78) f Yes, it came quite close g wider h because depth tells us a lot about age; without information about depth, it is difficult to pin down the age b (3) 13.25 for departures versus gates (on the left) 13.29 A government report about all 50 states tells us everything about the population; there is no larger group to generalize to Chapter 13 13.20 a (6) 13.21 a interval estimate of mean amount for all 5-day trips taken by the larger population of representatives b 2(1,176) ϭ 2,352 dollars because 10 days is times as long as days c interval estimate of mean amount paid for all 5-day trips d (9,988, 12,166) e The interval is not very accurate because 10 is too far from the mean trip length, days c (5) d (4) e (1) f (2) 13.34 a Distribution of b1 should be centered at b1 ϭ 2.25 b 250 children c s d 3(2.25) ϭ 6.75 inches 13.36 (c) [small sample with non-normal responses] 13.41 a closer to b relatively large because predictions of sale price based on assessed price are not very accurate c Conclude there is no evidence of a relationship between sale price and assessed price for the larger population of properties 13.42 a Texas school districts (for each district, record rate of autism and how much mercury is released in the county where that district is located) b Explanatory variable (quantitative) is pounds of environmentally released mercury in 2001; response variable (also quantitative) is rate of autism c 0.17/1,000 d b1 e No, the interval would not contain zero f much less than 0.05 g Setting as urban, small metro, or rural could play a role in mercury pollution, and also in autism rates via a variety of socioeconomic factors 13.48 a 0.90 (closer to 1), 0.08 b yN = 29.2 + 7.12x (steeper slope), yN = -784 + 0.47x c 17.55 (less prediction error), 40.15 d (4.32, 9.92) (interval contains only positive slopes), (Ϫ3.96, ϩ4.89) e 0.000 (small P-value), 0.815 f 5.85 (large t), 0.24 g (94.96, 179.84) (narrower interval), (40.3, 234.5) 13.50 a Velocity (feet per second) 708 1.5 1.0 0.5 Depth (feet) 10 Solutions to Selected Exercises (1) decrease (2) linear (3) high b the equation of the regression line c 10 Ϫ ϭ d t ϭ Ϫ6.80, P-value is 0.000; there is strong evidence of a relationship in general between depth and velocity e displaying and summarizing probability statistical inference data production 13.54 a touch on point Regression Plot Touch = 6.90442 + 0.656637 Point S = 2.54852 R-Sq = 34.4% 18 17 Touch 16 Chapter 14 14.5 one categorical variable; probability 14.6 one categorical variable; displaying and summarizing (Section 4.1) 14.7 categorical explanatory and quantitative response variables; statistical inference (Chapter 11) 14.8 two categorical variables; statistical inference (Chapter 12) 14.9 two quantitative variables; displaying and summarizing (Section 5.3) 14.12 two categorical variables; probability 15 14.14 two quantitative variables; data production 14 13 14.17 one quantitative variable; probability 12 14.18 two categorical variables; data production 11 14.19 one quantitative variable; data production 10 10 11 12 13 14 15 16 17 Point Regression Plot Touch = 14.4103 – 0.0370370 Gaze S = 3.14569 R-Sq = 0.1% 14.21 one categorical variable; statistical inference (Chapter 9) 14.27 two categorical variables; displaying and summarizing (Section 5.2) 14.29 one categorical variable; data production 14.36 two quantitative variables; statistical inference (Chapter 13) 18 17 16 Touch 709 14.37 categorical explanatory and quantitative response variables; displaying and summarizing (Section 5.1) 15 14 14.43 categorical explanatory and quantitative response variables; data production 13 12 14.44 categorical explanatory and quantitative response variables; probability 11 10 10 11 12 13 14 Gaze b s ϭ 2.549 c (8.306, 19.949) d because the sample size is small e s ϭ 3.146 f (6.892, 21.262) g 21.262 Ϫ 6.892 ϭ 14.37 is wider than 4(3.146) ϭ 12.584 h touch on point r ϭ ϩ0.59, touch on gaze r ϭ 0.03; r is closer to for the smaller s because the points are more tightly clustered i Smaller P-value corresponds to r closer to j point score 14.46 one quantitative variable; statistical inference (Chapter 10) This page intentionally left blank [...]... as the one observed, if population mean equaled the value m0 proposed in the null hypothesis This is the P-value of the test 4 The fourth step requires us to make a statistical inference decision, the crux of Part IV If the observed sample mean x is improbably far from the claimed population mean m0, we reject that claim and conclude that the alternative is true Otherwise, we continue to believe that... about a proposed value of the unknown population mean, when the population standard deviation is also unknown EXAMPLE 10.16 Test about a Mean When the Population Standard Deviation Is Unknown Background: A random sample of 19 female students at a university reported their weights as follows: 110 110 112 120 120 120 125 125 130 130 132 133 134 135 135 135 145 148 159 Because the sample size is 19, we... relevant statistic is sample mean earnings $3, 776, which is indeed less than $5,000 The standardized test statistic is z ϭ 3. 98 Software would produce these automatically from the quantitative data set entered Alternatively, z could have been calculated by hand m x - 5 as z = s>-1n0 = 3. 776 6.5> 1446 3 The P-value is the probability of a sample mean as low as (or lower than) 3. 776, if the population mean... on the sign of the alternative hypothesis Our decision of whether or not to reject the null hypothesis in favor of the alternative hinges on the P-value, which reports the probability of sample mean being greater than, less than, or as extreme in either direction as the one observed, under the assumption that the population mean m equals the value m0 proposed in the null hypothesis H0 : µ = µ0 vs Ha... reject the null hypothesis We conclude that the alternative hypothesis is true There is evidence th
at the mean weight of all college females, based on our sample, is less than the population mean reported by NCHS Either our sample represents a different population from theirs, or the students were under-reporting their weights Practice: Try Exercise 10.18 on page 490 If it is not obvious that the t statistic... alternative hypotheses Of course, if our proposed sample mean falls outside the interval in the opposite direction from what the alternative claims, we could not reject the null hypothesis For instance, if a 95% confidence interval for population mean earnings is (3. 171, 4 .38 1), and we test against the alternative hypothesis Ha : m Ͼ 5.000, then we would have no evidence at all to reject the null hypothesis... 1.5> 19 A CLOSER
LOOK The sign of z in Example 10.12 is positive because the observed sample mean was greater than the hypothesized population mean The fact that z is fairly small in absolute value—less than 1— suggests th
at the sample mean x = 11.222 is relatively close to the hypothesized population mean m0 ϭ 11.
0 3 The P-value is the probability of sample mean as different (in either direction) from... included in the study a Formulate the appropriate null and alternative hypotheses (using mathematical symbols) if we want to test whether the mean number of calves sired by all captive male belugas exceeds 1.0 b The mean number of calves sired by sampled males was found to be 1.4 (thus, greater than 1) and the standard deviation was 1 .35 Find the standardized sample mean c Explain why the standardized... skunks a Formulate the appropriate null and alternative hypotheses (using mathematical symbols) if we want to test whether the mean litter size for all captive female striped skunks is less than 6 b The mean litter size for sampled females was found to be 5.8 13 (thus, less than 6), and the standard deviation was 1.109 Find the standardized sample mean, under the assumption that the null hypothesis is true... Quantitative Variable A z Hypothesis Test about the Population Mean Just as we saw for tests about a proportion, the process of carrying out a hypothesis test about the unknown population mean varies, depending on which of three forms the alternative hypothesis takes What sort of values of sample mean provide evidence against the null hypothesis in favor of the alternative depends on the sign of the alternative