40 0 767KB
According to the Census Bureau's 2007 Current Population Survey, the mean and median income of people at least 25 years old who had a bachelor's degree but no higher degree were $46,453 and $58,886 (not necessarily in that order).
1. Which of these numbers is the mean and which is the median? Explain your reasoning. A. The median is $58,886 and the mean is $46,453. This is because economic variables are usually skewed to the left, which pulls the mean above the median. B. The mean is $58,886 and the median is $46,453. This is because economic variables are usually skewed to the left, which pulls the mean above the median. C. The median is $58,886 and the mean is $46,453. This is because economic variables are usually skewed to the right, which pulls the mean above the median. D. The mean is $58,886 and the median is $46,453. This is because economic variables are usually skewed to the right, which pulls the mean above the median. 2. Retirement seems a long way off and we need money now, so saving for retirement is hard. Among households with an employed person aged 21 to 64, only 63% own a retirement account. The mean value in these accounts is $112,300, but the median value is just $31,600. For people 55 or older, the mean is $222,100 and the median is $64,400. What explains the differences between the two measures of center? A. The distributions are probably right-skewed, because most of those with retirement savings have not saved much (giving low medians), but a few have saved hundreds of thousands or more (thus pulling the means up sharply.) B. The distributions are probably left-skewed, because most of those with retirement savings have not saved much (giving low medians), but a few have saved hundreds of thousands or more (thus pulling the means up sharply.) C. The distributions are probably right-skewed, because most of those with retirement savings have saved hundreds of thousands or more (giving high means), but a few have saved very small amounts (giving small medians). D. The distributions are probably left-skewed, because most of those with retirement savings have saved hundreds of thousands or more (giving high means), but a few have saved very small amounts (giving small medians). The National Association of College and University Business Officers collects data on college endowments. In 2007, 785 colleges and universities reported the value of their endowments. When the endowment values are arranged in order, what are the positions of the median and the quartiles in this ordered list? Note, use half integers to represent results in between actual positions. Be sure you calculate your results manually exactly as described in the text and not using software which may have slightly different definitions for the median and quartiles.
3. The median is in position (Answer to 1 decimal place)
Answer The median's position is calculated using the formula (n + 1)/2 = 393, with n = 741 being the number of observations.
4. The first quartile is in position (Answer to 1 decimal place)
Answer 196.5
5. The third quartile is in position (Answer to 1 decimal place)
Answer 589.5
Here is the distribution of the weight at birth for all babies born in the United States in 2005: Weight Less than 500 grams 500 to 999 grams 1,000 to 1,499 grams 1,500 to 1,999 grams 2,000 to 2,499 grams 2,500 to 2,999 grams
Count 6,599 23,864 31,325 66,453 210,324 748,042
3,000 3,500 4,000 4,500 5,000
Weight to 3,499 to 3,999 to 4,499 to 4,999 to 5,499
grams grams grams grams grams
Count 1,596,944 1,114,887 289,098 42,119 4,715
6. For comparison with other years and with other countries, we prefer a histogram of the percents in each weight class rather than the counts. Explain why. A. The use of percents will help us find outlier years/countries where the columns of the histogram don't add up to 100%. B. Calculating percents makes it easier to display the data using a pie graph. C. Different years and countries may have different overall numbers of newborns, making a comparison based on the absolute numbers difficult. D. None of the answers are correct. The correct answer is C. A - By definition, if a histogram is plotted correctly and encompasses all of the data, then all of the columns have to add up to the total number of observations or to 100%. Anything else is a mistake. B - A pie graph is not used to represent distributions. D - Answer C is correct. Points Earned: 1/1 Correct Answer: C Your Response: C 7. How many babies were there? Correct Answer: 4,134,370
Make a histogram of the distribution, using percents on the vertical scale. Choose the correct 8. histogram below.
A. Histogram I. B. Histogram II. C. Histogram III. D. Histogram IV. Histogram II is the correct one. It is easily identified by the relative heights of the three largest classes. Points Earned: 1/1 Correct Answer: B Your Response: B 9. What are the positions of the median and quartiles in the ordered list of all birth weights? Match your results below. 1. 1,033,593
6. 2,067,185.5
2. 1,004,684.5
7. 3,100,778
3. 1,004,685
8. 3,014,051.5
4. 2,009,366.5
9. 3,014,052
5. 2,009,367
10. 3,014,052.5
A. The first quartile's position is B. The median's position is C. The third quartile's position is There are a total of n = 4,134,370 observations. The median's position is (n + 1)/2 = 2,067,185.5. The first quartile's position is calculated as the median of the first 2,067,185 observations which gives (2,067,185 + 1)/2 = 1,033,593. The third quartile's position is calculated as the median of the last 2,067,185 observations which gives 2,067,185 + 1,033,593 = 3,100,778. Points Earned: 0/3 Correct Answer: A:1, B:6, C:7 Your Response: A:3, B:5, C:8 10. In which weight classes do the median and quartiles fall? 1. Less than 500 grams
7. 3,000 to 3,499 grams
2. 500 to 999 grams
8. 3,500 to 3,999 grams
3. 1,000 to 1,499 grams
9. 4,000 to 4,499 grams
4. 1,500 to 1,999 grams
10. 4,500 to 4,999 grams
5. 2,000 to 2,499 grams
11. 5,000 to 5,499 grams
6. 2,500 to 2,999 grams A. The first quartile's class is B. The median's class is C. The third quartile's class is After finding the positions of the median and quartiles, we can find the associated classes by summing up the total number of observations needed to reach each class to find the positon of the beginning of each class. The following table summarizes the starting positions of the classes. Weight Less than 500 grams 500 to 999 grams 1,000 to 1,499 grams
Starts at Position 1 6,269 29,114
1,500 to 1,999 grams
58,545
2,000 to 2,499 grams
120,197
2,500 to 2,999 grams
314,078
3,000 to 3,499 grams
1,002,708
3,500 to 3,999 grams
2,524,592
4,000 to 4,499 grams
3,650,551
4,500 to 4,999 grams
3,964,733
5,000 to 5,499 grams
4,013,339
Using the result for the median's position 2,009,367.5 we see that it is in the class "3,000 to 3,499 grams". Similarly, the first quartile (in position 1,004,684) falls in the class "3,000 to 3,499 grams", while the third quartile (in position 3,014,051 ) is in the class "3,500 to 3,999 grams". Points Earned: 0/3 Correct Answer: A:6, B:7, C:8 Your Response: A:3, B:6, C:9 We asked the students in a large first-year college class how many minutes they studied on a typical weeknight. Here are the responses of random samples of 30 women and 30 men from the class: 180 120 150 200 120 90
120 180 120 150 60 240
Women 180 360 120 240 180 180 180 150 120 180 180 115
240 170 150 180 180 120
90 90 150 240 30 0
120 45 120 60 230 200
Men 30 30 60 120 120 120
90 120 240 60 95 120
200 75 300 30 150 180
Data set The most common methods for formal comparison of two groups use x and s to summarize the data. 11. What kinds of distributions are best summarized by x and s ? A. Skewed distributions without outliers. B. Distributions that are fairly symmetric and free of outliers. C. Symmetric distributions, outliers make no difference. D. Distributions of economic variables, since they are usually skewed to the right. Both the mean and the standard deviation are not resistant measures, meaning that they are highly influenced by outliers and skewedness. Therefore only symmetric distributions without any outliers are good candidates for using the mean and standard deviation - Answer B. Points Earned: 1/1 Correct Answer: B
12. One over-zealous student in each group claimed to study at least 300 minutes (five hours) per night. Let's check their influence on x and s. By how much does removing these observations change x for the men's group? Note that negative results indicate a decrease in x when the over-zealous student was removed. A. 12.86 B. 7.36 C. -6.30 D. -7.36 The mean for all of the men is 117.17, while removing the over-zealous student gives 110.86, for an overall change of 110.86 − 117.16 = -6.30.
13. By how much does removing the over-zealous student change s for the men's group? A. -66.88 B. 6.30 C. -6.30 D. -7.36 The standard deviation for all of the men is 74.24, while removing the over-zealous student gives 66.88, for an overall change of 66.88 − 74.24 = -7.36. 14. By how much does removing the over-zealous student change x for the women's group? A. 6.30 B. -12.86 C. -6.30 D. -6.72 The mean for all of the women is 165.17, while removing the over-zealous student gives 158.45, for an overall change of 158.45 − 165.17 = -6.72. 15. By how much does removing the over-zealous student change s for the women's group? A. -66.88 B. -12.86 C. -6.30 D. -7.36 The standard deviation for all of the women is 56.51, while removing the over-zealous student gives 43.65, for an overall change of 43.65 − 56.51 = -12.86. Here are the survival times in days of 72 guinea pigs after they were injected with infectious bacteria in a medical experiment. Survival times, whether of machines under stress or cancer patients after treatment, usually have distributions that are skewed to the right. 43
45
53
56
56
57
58
66
67
73
74
79
80
80
81
81
81
82
83
83
84
88
89
91
91
92
92
97
99
99
100
100
101
102
102
102
103
104
107
108
109
113
114
118
121
123
126
128
137
138
139
144
145
147
156
162
174
178
179
184
191
198
211
214
243
249
329
380
403
511
522
598
Data set Make a histogram of the distribution using classes 50 days wide (for example the second 16 class has values 50 < days ≤ 100). Which of the histograms below correctly describes the . distribution?
A. Histogram I. B. Histogram II. C. Histogram III. D. Histogram IV. The correct choice is Histogram III. Make sure you chose the classes exactly as specified. Note that the second class (50 < days ≤ 100) has 30 guinea pigs, and Histogram III is the only one that reflects this. Points Earned: 1/1 Correct Answer: C Your Response: C 17. Describe the distribution's main features. Mark the appropriate features below. A. Right skewed. B. Symmetrical.
C. Left skewed. D. Single peaked. E. Double peaked. F. None of the answers are correct. The distribution is best described as right skewed with a single main peak. Points Earned: 1/2 Correct Answer: A, D Your Response: A 18. Which numerical summary would you choose for these data? A. Mean and standard deviation. B. Five-number summary. C. None of the answers are correct. Since the distribution is single peaked a numerical summary is applicable. The skewedness of the distribution means that the five-number summary is better suited than the mean and standard deviation (both of which are not resistant to skewed tails and outliers). Points Earned: 0/1 Correct Answer: B Your Response: A 19. Calculate your chosen summary. Mark numerical measures that are not relevant to your numerical summary as so. Note that the five-number summary may vary slightly depending on the definitions used by different calculator/software applications. Therefore if applicable, calculate it manually exactly as described by the procedures in the text. As for the standard deviation, if it's relevant, make sure that you calculate it as defined in the text, dividing by (n− 1) and not by n as done by some calculators/software applications. 1. 42
5. 82.5
9. 151.5
2. 43
6. 102.5
10. 153
3. 43.5
7. 103
11. 598
4. 81.5
8. 103.5
12. Not Relevant.
A. Mean. B. Standard deviation. C. Minimum. D. First Quartile. E. Median. F. Third Quartile. G. Maximum. The correct numerical measure is the five-number summary. Refer to examples 2.3 and 2.5 for explanations on how to calculate the median and quartiles. Points Earned: 5/7
Correct Answer: A:12, B:12, C:2, D:5, E:6, F:9, G:11 Your Response: A:12, B:12, C:1, D:3, E:6, F:9, G:11 The table below gives the mean number of births in the United States on each day of the week during an entire year. Day Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Births 7,374 11,704 13,169 13,038 13,013 12,664 8,459
Data set
20. Based on these boxplots, give a more detailed description of how births depend on the day of the week. Mark the correct answers below. A. There is a marked drop in weekend birthrates, with at least 75% percent of the weekday observations not overlapping with at least 75% of the weekend observations. B. There is a marked drop in weekend birthrates, with no overlap between the weekend and weekday observations. C. There is a marked drop in weekend birthrates, with an overlap of more than 75%
between the weekend and weekday observations. D. All of the days have highly skewed distributions. E. The weekend days have similar distributions. F. Most weekdays have similar distributions. The correct answers are A, E, and F. A - Note that there is no overlap between weekend observations below the third quartile and weekday observations above the first quartile, meaning that at least 75% of the weekend observations don't overlap with at least 75% of the weekday observations. B - Is wrong since there are overlapping observations between the weekends and weekdays, as can be seen by the minimal number of births during weekdays that overlap with the weekend distributions and the maximal number of weekend births that overlap with the weekday distributions. C - Is wrong, see explanation for A. D - Is wrong, since most days have fairly symmetrical distributions as can be seen by the median falling almost exactly in between the quartiles. The only possible exception is Tuesday, which has a slight right-hand skew. E, F - Are correct, since in general the weekday distributions overlap between themselves, as do the weekend distributions. Points Earned: 1/3 Correct Answer: A, E, F Your Response: A, C, F 21. A report says that "the median credit card debt of American households is zero." We know that many households have large amounts of credit card debt. Explain how the median debt can nonetheless be zero. Choose the most plausible explanation: A. The median debt can nonetheless be zero because it is not a resistant measure. B. The median debt is zero because the distribution is left-skewed. C. The median debt is zero because the first and the third quartiles are probably equal. D. The median debt is zero because more than half of credit card debts are zero. Households with no credit cards, as well as those which pay off the balance each month, have no credit card debt. If we list the credit card debt figures for all American households, more than half of the numbers in that list equal zero, so the median is zero. Points Earned: 1/1 Correct Answer: D Your Response: D This is a standard deviation contest. You must choose four numbers from the whole numbers 0 to 10, with repeats allowed.
22. Choose four numbers that have the smallest possible standard deviation. What is s in this case? Round your answer to 3 decimal digits.
Answer As long as you choose 4 identical number, the standard deviation will be zero. Points Earned: 0/1 Correct Answer: 0.000 Your Response: 0,1,2,3 23. Is there more than one possibility for choosing four numbers that have the smallest possible standard deviation? A. Yes. B. No. As long as you choose 4 identical number, the standard deviation will be zero, leaving us with 11 possible choices in the range 0 to 10. Points Earned: 0/1 Correct Answer: A Your Response: 24. Choose four numbers that have the largest possible standard deviation. Match your choice of numbers below in rising order. Pay attention that the number 0 is the 11th choice. 1. 1
7. 7
2. 2
8. 8
3. 3
9. 9
4. 4
10. 10
5. 5
11. 0
6. 6 A. First number (smallest). B. Second number. C. Third number. D. Fourth number (largest). See explanation in next question. Points Earned: 2/4 Correct Answer: A:11, B:11, C:10, D:10 Your Response: A:11, B:3, C:7, D:10 25. Is there more than one way to choose four numbers that give the largest possible standard deviation? A. Yes. B. No.
The choice that gives the maximal standard deviation (which turns out to be 5.774) is by choosing (0,0,10,10). Let see how we arrived at this result. It is clear that in order to get the maximal standard deviation the distribution of numbers should have the largest spread and therefore it should consist of the numbers that are the furthest apart, namely 0 and 10. This leaves us with three combinations to check: (0, 0, 0,10), s = 5 (0, 0,10,10), s = 5.774 (0,10,10,10), s = 5 Points Earned: 1/1 Correct Answer: B Your Response: B 26. What is the value of the largest possible standard deviation? Round your answer to 2 decimal digits.
Answer The choice of numbers for the maximal standard deviation is (0,0,10,10), see explanation in previous question. These give a standard deviation of 5.77. Make sure that when calculating the standard deviation, you divide by (n − 1) and not by n as done by some calculators/software applications. See Example 2.7 for a detailed calculation of the standard deviation. Points Earned: 0/1 Correct Answer: 5.77 Your Response: In 2007, the Boston Red Sox won the World Series for the second time in 4 years. The table below gives the salaries of the Red Sox players as of opening day of the 2007 season. Data set
Table 2.2Salaries for the 2007 Boston Red Sox World Series team
Player
Salary
Player
Salary
Player
Salary
Josh Beckett
$6,666,667 Jon Lester
$384,000 Jonathan Papelbon
$425,000
Alex Cora
$2,000,000 Javier Lopez
$402,000 Dustin Pedroia
$380,000
Coco Crisp
$3,833,333 Mike Lowell
$9,000,000 Manny Ramirez
$17,016,381
$8,250,000 Curt Schilling
$13,000,000
Manny Delcarmen J.D. Drew Jacoby Ellsbury
$380,000 Julio Lugo $14,400,000 Daisuke Matsuzaka
$6,333,333 Kyle Snyder
$535,000
$380,000 Doug Mirabelli
$750,,000 Mike Timlin
$2,800,000
$1,225,000 Jason Varitek
$11,000,000
Eric Gagne
$6,000,000 Hideki Okajimi
Eric Hinske
$5,725,000 David Ortiz
Bobby Kielty
$2,100,000
$13,250,000 Kevin Youkilis
$424,000
Describe the distribution of salaries with a histogram using classes 2 million dollars wide. 27 Which of the histograms below depicts the distribution correctly? .
A. Histogram I. B. Histogram II. C. Histogram III. D. Histogram VI. The correct answer is Histogram II. Points Earned: 0/1 Correct Answer: B Your Response: C 28. Which numerical summary would you choose for these data? A. Mean and standard deviation. B. Five-number summary.
C. Both are equally suited. The skewedness of the distribution means that the five-number summary is better suited than the mean and standard deviation (both of which are not resistant to skewed tails and outliers). Points Earned: 1/1 Correct Answer: B Your Response: B 29. Calculate your chosen summary. Mark numerical measures that are not relevant to your numerical summary as so. Note that the five-number summary may vary slightly depending on the definitions used by different calculator/software applications. Therefore if applicable, calculate it manually exactly as described by the procedures in the text. As for the standard deviation, if it's relevant, make sure that you calculate it as defined in the text, dividing by (n− 1) and not by n as done by some calculators/software applications. 1. $380,000
5. $1,850,000
9. $5,066,389
2. $850,000
6. $2,800,000
10. $8,625,000
3. $1,175,000
7. $5,234,351
11. $17,016,381
4. $424,500
8. $4,630,838
12. Not Relevant.
A. Mean. B. Standard deviation. C. Minimum. D. First Quartile. E. Median. F. Third Quartile. G. Maximum. The correct numerical measure is the five-number summary. Refer to examples 2.2 and 2.4 for explanations on how to calculate the median and quartiles. Points Earned: 3/7 Correct Answer: A:9, B:7, C:1, D:4, E:6, F:10, G:11 Your Response: A:-, B:-, C:1, D:-, E:-, F:10, G:11 30. Based on your graph and numerical summary, describe the distribution's main features. Mark the appropriate features below. A. Right skewed. B. Symmetrical. C. Left skewed. D. None of the answers are correct. E. There are outliers. F. There are no outliers. The distribution is best described as right skewed with several outliers. Points Earned: 0/2
Correct Answer: A, F Your Response: C How well have stocks done over the past generation? The Standard & Poor's 500 stock index describes the average performance of the stocks of 500 leading companies. Because the average is weighted by the total market value of each company's stock, the index emphasizes larger companies. Here are the real (that is, adjusted for the changing buying power of the dollar) returns on the S&P 500 for the years 1971 to 2006:
Data set What can you say about the distribution of real returns on stocks? Follow the four-step process in your answer. STATE: Which of the options below clearly states the practical question we are trying to 31. answer from the available data? A. If you had $1 in the beginning of 1972, how many dollars would you have by the end of 2006? B. What is the likelihood of making a profit by investing in the stock market? C. How can we describe the distribution of returns on stocks (shape, center and spread)? D. Is it better to invest in large companies or in the smaller ones? The correct answer is C. The others are wrong for the following reasons: A - Eventhough we can get the answer from the data, this tells us nothing on the distribution of returns, which is what we're trying to answer. B - This still doesn't relate directly to the distribution of returns. D - Is not the question asked, and the data can not provide an answer to it. Points Earned: 1/1 Correct Answer: C Your Response: C FORMULATE: Which of the following statistical methods are relevant in this particular 32. case? Select the applicable methods below. This is a general question, answer it in the context of the STATE step. A. Use numerical measures such as the five-number summary or the mean and standard deviation to describe the distribution. B. Plot the data using histograms or stemplots. C. Plot the data using a time plot. D. Use a pie chart to get a feeling for the shape of the distribution.
E. Use a bar graph to get a feeling for the shape of the distribution. F. Look for trends and cyclical behavior in the time plot. According to the STATE step, we are interested in describing the shape of the distribution. Therefore we first need to plot it using a histogram or stemplot (time plots, bar graphs and pie charts are not applicable to distributions), and then we could describe the distribution using numerical measures such as the mean and standard deviation or the five-number summary, depending on the exact shape of the distribution. Points Earned: 1/2 Correct Answer: A, B Your Response: B SOLVE: Plot the data using a histogram with classes 10% wide. Compare your result to 33. the histograms below and chose the correct one.
A. Histogram I. B. Histogram II. C. Histogram III. D. Histogram IV. Histogram I is the correct answer.
Points Earned: 0/1 Correct Answer: A Your Response: B 34. SOLVE (continued): Which numerical summary would you choose for these data? A. Mean and standard deviation. B. Five-number summary. C. Neither of the above. The distribution has a relatively regular single-peaked shape, and therefore numerical summaries are applicable. The skewedness of the distribution means that the five-number summary is better suited than the mean and standard deviation (both of which are not resistant to skewed tails and outliers). Points Earned: 1/1 Correct Answer: B Your Response: B 35. SOLVE (continued): Calculate your chosen summary. Mark numerical measures that are not relevant to your numerical summary as so. Note that the five-number summary may vary slightly depending on the definitions used by different calculator/software applications. Therefore if applicable, calculate it manually exactly as described by the procedures in the text. As for the standard deviation, if it's relevant, make sure that you calculate it as defined in the text, dividing by (n− 1) and not by n as done by some calculators/software applications. 1. -34.5400%
5. 11.6770%
9. 26.3105%
2. -5.4715%
6. 17.7560%
10. 26.5345%
3. -2.2640%
7. 19.0085%
11. 34.1670%
4. 7.9245%
8. 22.4145%
12. Not Relevant.
A. Mean. B. Standard deviation. C. Minimum. D. First Quartile. E. Median. F. Third Quartile. G. Maximum. The correct numerical measure is the five-number summary. Refer to examples 2.2 and 2.4 for explanations on how to calculate the median and quartiles. Points Earned: 0/7 Correct Answer: A:12, B:12, C:1, D:2, E:5, F:8, G:11 Your Response: A:-, B:-, C:-, D:-, E:-, F:-, G:CONCLUDE: Which of the following are conclusions you can draw based on your 36. statistical analysis?
A. The distribution is right skewed, just like most economic variables. B. If you invested 1$ in the stock market in 1972, by 2006 you would have $7.69. C. On average, bigger companies have higher returns than small ones. D. The distribution has a left skew. E. The center of the stock market returns distribution is positive. F. In more than half of the surveyed years, the stock returns where above 10%. The correct answers are D, E, and F. Answer F is a direct consequence of the median being 11.677% A - Is wrong, since the distribution is left-skewed. The rest of the answers are non-relevant and/or do not answer our the question from the STATE step. Some of them jump ahead to conclusions that cannot be based on the data at hand. Points Earned: 0/3 Correct Answer: D, E, F Your Response: People gain weight when they take in more energy from food than they expend. Table 2.4 compares volunteer subjects who were lean with others who were mildly obese.
None of the subjects followed an exercise program. The subjects wore sensors that recorded every move for 10 days. The table shows the average minutes per day spent in activity (standing and walking) and in lying down. Compare the distributions of time spent actively for lean and obese subjects and also the distributions of time spent lying down. How does the behavior of lean and mildly obese people differ? 37. State: Which of the options below clearly states the practical question we are trying to answer from the available data? A. Do lean people spend more energy than obese people in daily activities? B. How do lean and obese people differ in time spent in activity and in time spent lying down? C. Are there differences in time spent by each group in the two activities?
D. Compare the two groups for the difference between energy they take from food and the energy they expend in daily activities. State: How do lean and obese people differ in time spent in activity and in time spent lying down? Points Earned: 0/1 Correct Answer: B Your Response: 38. Plan: Which of the options below is most appropriate for planning your statistical analysis? A. Compare each pair of distributions using graphs. B. Compare each pair of distributions using graphs, means and standard deviations. C. Compare each pair of distributions using numerical summaries. D. Compare each pair of distributions by first using graphs and then numerical summaries. Plan: We will compare each pair of distributions using graphs and numerical summaries. Points Earned: 0/1 Correct Answer: D Your Response: 39. Solve: Draw back-to-back stemplots. Choose the option that best describes your stemplots. A. None of the stemplots show any particular skewness. B. None of the stemplots show any particular skewness but there are some outliers. C. The distributions are sharply skewed to the left but no outliers are apparent. D. The "Time active-lean" group is considerably skewed, but the other distributions are quite symmetric. Solve: Below are two back-to-back stemplots; histograms or boxplots could also be used. None of the stemplots show any particular skewness.
Points Earned: 0/1 Correct Answer: A Your Response:
40. Solve: Which of the options below is the most appropriate numerical summary for these data? A. Five-number summary. B. Means and standard deviations. C. Medians and standard deviations. D. Five-number summary and means and standard deviations. Since none of the distributions show particular skewness, either means and standard deviations or five-number summaries would be suitable. Points Earned: 1/1 Correct Answer: D Your Response: D 41. Conclude: The means, standard deviations and five-number summaries of the distributions are shown below:
What is your conclusion based on this analysis? True or False: "There is no noticeable difference between the two groups of people, in time spent in activity and in time spent lying down." Answer Conclude: In both the stemplots and the numerical summaries, we observe that lean subjects spent more active time than the obese subjects. There was little difference in time spent lying down. Points Earned: 1/1 Correct Answer: False Your Response: False The table below gives carbon dioxide (CO2) emissions per person for countries with population at least 20 million. A stemplot or histogram shows that the distribution is strongly skewed to the right. The United States and several other countries appear to be high outliers.
Data Set Data Set 42. Give the five-number summary. Note that the five-number summary may vary slightly depending on the definitions used by different calculator/software applications. Therefore calculate it manually exactly as described by the procedures in the text. Match your answers below. The values are given in $millions. 1. 0.1
5. 2.50
9. 4.60
2. 0.55
6. 2.85
10. 4.85
3. 0.95
7. 3.3
11. 7.4
4. 0.85
8. 3.95
12. 19.6
A. Minimum. B. First Quartile. C. Median. D. Third Quartile. E. Maximum. Refer to Exercise 1.36 for more information. Points Earned: 3/5 Correct Answer: A:1, B:3, C:7, D:11, E:12 Your Response: A:1, B:3, C:5, D:9, E:12 43. Does the five-number summary suggest that the distribution is right-skewed? Explain. A. No, one cannot get any indication of a distribution's skewedness without making a stemplot or histogram. B. No, in order to see a skew, we need the mean and standard deviation. C. Surprisingly, the numbers indicate a left skew.
D. Yes, one can see that a distribution is skewed by the position of the median relative to the quartiles. In this case the median is closer to the first quartile, indicating a right-hand skew. D is the correct answer. A - While it is true that a plot gives more information than a numberical summary, the fivenumber summary contains enough information to give an indication of a distribution's center, spread and skew, as D explains. B - The mean and standard deviation give no indication of a distribution's skew. As D explains, the five-number summary does. C - Is wrong, see the explanation in D. Points Earned: 1/1 Correct Answer: D Your Response: D 44. Below is a stemplot of the carbon dioxide emissions distribution. It suggests that a few countries are outliers. How many countries are outliers according to the 1.5 × IQR rule?
A. No countries. B. 1 country. C. 2 countries. D. 3 countries. E. 4 countries. The 1.5 × IQR rule limits for outliers are calculated as follows: First we calculate the IQR from the quartiles IQR = Q3 − Q1 = 7.05 Next we calculate the limits: Lower limit = Q1 − 1.5 × IQR = -9.825 Upper limit = Q3 + 1.5 × IQR = 18.375
Only the United States falls outside these limits and therefore there is only one outlier according to the 1.5 × IQR rule. See Example 2.6 for more details. Points Earned: 1/1 Correct Answer: C Your Response: C 45. Do the 1.5 × IQR rule’s suggestions about which countries are and are not outliers match what you see in the stemplot? A. Yes. B. No. The plot shows that there are 3 outliers, Australia, Canada and the United States. On the other hand, the rule points out only the United States as an outlier. Points Earned: 0/1 Correct Answer: B Your Response: A The table below gives the salaries of the Red Sox players as of opening day of the 2007 season.
Data Set
46. Which members of the Boston Red Sox have salaries that are suspected outliers by the 1.5 × IQR rule? Match your answers below. Make sure you calculate the quartiles as defined by the text. 1. Is an outlier. 2. Is not an outlier. A. Josh Beckett B. Curt Schilling C. David Ortiz The quartiles are Q1 = $424,500 and Q3 = $8,625,000. Then the 1.5 × IQR rule limits for outliers are calculated as follows: First we calculate the IQR from the quartiles IQR = Q3 − Q1 = $8,200,500 Outliers are those salaries above $20,925,750; there are no such salaries. Points Earned: 2/3
Correct Answer: A:2, B:2, C:2 Your Response: A:1, B:2, C:2 How well have stocks done over the past generation? The Wilshire 5000 index describes the average performance of all U.S. stocks. The average is weighted by the total market value of each company's stock, so think of the index as measuring the performance of the average investor. Here are the percent returns on the Wilshire 500 index for the years 1971 to 2006: Year 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982
Return 16.19 17.34 -18.78 -27.87 37.38 26.77 -2.97 8.54 24.40 33.21 -3.98 20.43
Year 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
Return 22.71 3.27 31.46 15.61 1.75 17.59 28.53 -6.03 33.58 9.02 10.67 0.06
Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Return 36.41 21.56 31.48 24.31 24.23 -10.89 -10.97 -20.86 31.64 12.48 6.38 15.77
Data Set 47. The returns on stocks vary a lot: they range from a loss of more than 27% to a gain of more than 34%. Are any of these years suspected outliers by the 1.5 × IQR rule? Match your answers below. Calculate the quartiles as defined by the text. 1. Is an outlier. 2. Is not an outlier. A. 1995 B. 1997 C. 2002 D. 1974 The quartiles are Q1 = 0.905% and Q3 = 25.585%. The 1.5 × IQR rule limits for outliers are calculated as follows: First we calculate the IQR from the quartiles IQR = Q3 − Q1 = 24.68% Next we calculate the limits: Lower limit = Q1 − 1.5 × IQR = -36.115% Upper limit = Q3 + 1.5 × IQR = 62.605 % These limits clearly fall outside the extents of the entire distribution and therefore there are no outliers according to the 1.5 × IQR rule. See Exercise 2.44 for more details. Note that the quartiles where calculated according to the definitions in the text. Points Earned: 2/4 Correct Answer: A:2, B:2, C:2, D:2 Your Response: A:2, B:2, C:1, D:1 Continue