4 Solutions to Exercises

[Pages:10]50

Solutions to Exercises

4 Solutions to Exercises

4.1 About these solutions

The solutions that follow were prepared by Darryl K. Nester. I occasionally pillaged or

plagiarized solutions from the second edition (prepared by George McCabe), but I take full

responsibility for any errors that may remain. Should you discover any errors or have any

comments about these solutions (or the odd answers, in the back of the text), please report

them to me:

Darryl Nester

Bluffton College

Bluffton, Ohio 45817

email: nesterd@bluffton.edu

WWW:

4.2 Using the table of random digits

Grading SRSs chosen from the table of random digits is complicated by the fact that students can ?nd some creative ways to (mis)use the table. Some approaches are not mistakes, but may lead to different students having different ?right? answers. Correct answers will vary based on:

? The line in the table on which they begin (you may want to specify one if the text does not).

? Whether they start with, e.g., 00 or 01.

? Whether or not they assign multiple labels to each unit.

? Whether they assign labels across the rows or down the columns (nearly all lists in the text are alphabetized down the columns).

Some approaches can potentially lead to wrong answers. Mistakes to watch out for include:

? They may forget that all labels must be the same length, e.g., assigning labels like 0, 1, 2, . . . , 9, 10, . . . rather than 00, 01, 02, . . ..

? In assigning multiple labels, they may not give the same number of labels to all units. E.g., if there are 30 units, they may try to use up all the two-digit numbers, thus assigning 4 labels to the ?rst ten units and only 3 to the remaining twenty.

4.3 Using statistical software

The use of computer software or a calculator is a must for all but the most cursory treatment of the material in this text. Be aware of the following considerations:

Acknowledgments

51

? Standard deviations: Students may easily get confused by software which gives both

the so-called ?sample standard deviation? (the one used in the text) and the ?population

standard deviation? (dividing by n rather than n - 1). Symbolically, the former is

usually given as ?s? and the latter as ? ? (sigma), but the distinction is not always clear.

For example, many computer spreadsheets have a command such as ?STDEV(. . . )? to

compute a standard deviation, but you may need to check the manual to ?nd out which

kind it is. As a quick check: for the numbers 1, 2, 3, s = 1 while =. 0.8165. In general, if

two values are given, the larger one is s and the smaller is . If only one value is given,

and it is the ?wrong? one, use the relationship s =

n n-1

.

? Quartiles and ?ve-number summaries: Methods of computing quartiles vary between different packages. Some use the approach given in the text (that is, Q1 is the median of all the numbers below the location of the overall median, etc.), while others use a more complicated approach. For the numbers 1, 2, 3, 4, for example, we would have Q1 = 1.5 and Q3 = 2.5, but Minitab reports these as 1.25 and 2.75, respectively. Since I used Minitab for most of the analysis in these solutions, this was sometimes a problem. However, I remedied the situation by writing a Minitab macro to compute quartiles the IPS way. (In effect, I was ?dumbing down? Minitab, since its method is more sophisticated.) This and other macros are available at my website.

? Boxplots: Some programs which draw boxplots use the convention that the ?whiskers? extend to the lower and upper deciles (the 10th and 90th percentiles) rather than to the minimum and maximum. (DeltaGraph, which I used for most of the graphs in these solutions, is one such program. It took some trickery on my part to convince it to make them as I wanted them.) While the decile method is merely different from that given in the text, some methods are (in my opinion) just plain wrong. Some graphing calculators from Sharp draw ?box charts,? which have a center line at the mean (not the median), and a box extending from x - to x + ! I know of no statistics text that uses that method.

4.4 Acknowledgments

I should mention the software I used in putting these solutions together:

? For typesetting: TEX ? speci?cally, Textures, from Blue Sky Software.

? For the graphs: DeltaGraph (SPSS), Adobe Illustrator, and PSMathGraphs II (MaryAnn Software).

? For statistical analysis: Minitab, G?Power, JMP IN, and GLMStat?the latter two mostly for the Chapters 14 and 15. George McCabe supplied output from SAS for Chapter 15. G?Power is available as freeware on the Internet, while GLMStat is shareware. Additionally, I used the TI-82, TI-85, TI-86, and TI-92 calculators from Texas Instruments.

52

Chapter 1 Looking at Data ? Distributions

Chapter 1 Solutions

Section 1: Displaying Distributions with Graphs

1.1 (a) Categorical. (b) Quantitative. (c) Categorical. (d) Categorical. (e) Quantitative. (f) Quantitative.

1.2 Gender: categorical. Age: quantitative. Household income: quantitative. Voting Democratic/Republican: categorical.

1.3 The individuals are vehicles (or ?cars?). Variables: vehicle type (categorical), where made (categorical), city MPG (quantitative), and highway MPG (quantitative).

1.4 Possible answers (unit; instrument): ? number of pages (pages; eyes) ? number of chapters (chapters; eyes) ? number of words (words; eyes [likely bloodshot after all that counting]) ? weight or mass (pounds/ounces or kilograms; scale or balance) ? height and/or width and/or thickness (inches or centimeters; ruler or measuring tape) ? volume (cubic inches or cubic centimeters; ruler or measuring tape [and a calculator])

Any one of the ?rst three could be used to estimate the time required to read the book; the last two would help determine how well the book would ?t into a book bag.

1.5 A tape measure (the measuring instrument) can be used to measure (in units of inches or centimeters) various lengths such as the longest single hair, length of hair on sides or back or front. Details on how to measure should be given. The case of a bald (or balding) person would make an interesting class discussion.

1.6 Possible answers (reasons should be given): unemployment rate, average (mean or median) income, quality/availability of public transportation, number of entertainment and cultural events, housing costs, crime statistics, population, population density, number of automobiles, various measures of air quality, commuting times (or other measures of traf?c), parking availability, taxes, quality of schools.

1.7 For (a), the number of deaths would tend to rise with the increasing population, even if cancer treatments become more effective over time: Since there are more people, there are more potential cases of cancer. Even if treatment is more effective, the increasing cure rate may not be suf?cient to overcome the rising number of cases. For (b), if treatments for other diseases are also improving, people who might have died from other causes would instead live long enough to succumb to cancer.

Solutions

53

Even if treatments were becoming less effective, many forms of cancer are detected earlier as better tests are developed. In measuring ?ve-year survival rates for (c), if we can detect cancer (say) one year earlier than was previously possible, then effectively, each patient lives one year longer after the cancer is detected, thus raising the ?ve-year survival rate.

1.8

(a) 1988:

949 24,800,000

=. 0.00003827 = 38.27 deaths per million riders.

1992:

903 54,632,000

=.

0.00001653 = 16.53 deaths per million riders. Death rates are less than half what they

were; bicycle riding is safer. (b) It seems unlikely that the number of riders more than

doubled in a six-year period.

1.9

Using the proportion or percentage of repairs, Brand A is more reliable:

2942 13,376

=. 0.22 =

22%

for

Brand

A,

and

192 480

=

0.4

=

40%

for

Brand

B.

1.10 (a) Student preferences may vary; be sure they give a reason. Method 1 is faster, but less accurate?it will only give values that are multiples of 10. (b) In either method 1 or 2, fractions of a beat will be lost?for example, we cannot observe 7.3 beats in 6 seconds, only 7. The formula 60 ? 50 ? t, where t is the time needed for 50 beats, would give a more accurate rate since the inaccuracy is limited to the error in measuring t (which can be measured to the nearest second, or perhaps even more accurately).

1.11 Possible answers are total pro?ts, number of employees, total value of stock, and total assets.

1.12 (a) Yes: The sum of the ethnic group counts is 12,261,000. (b) A bar graph or pie chart (not recommended) may be used. In order to see the contrast of the heights of the bars, the chart needs to be fairly tall.

Number of students (thousands)

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

IAnmdiearnican Asian

noBnl-aHcikspanicHispanic nownh-HitiespaniFcoreign

54

Chapter 1 Looking at Data ? Distributions

1.13 (a) Shown at right. The bars are given in the same order as the data in the table?the most obvious way?but that is not necessary (since the variable is nominal, not ordinal). (b) A pie chart would not be appropriate, since the different entries in the table do not represent parts of a single whole.

Percent of female doctorates

60

50

40

30

20

10

0 Comp. Life Educ. Engin. Phys. Psych.

Sci. Sci.

Sci.

1.14

(a) Below.

For example, ?Motor Vehicles? is 46% since

41,893 90,523

= 0.4627 . . ..

The

?Other causes? category is needed so that the total is 100%. (b) Below. The bars may be

in any order. (c) A pie chart could also be used, since the categories represent parts of a

whole (all accidental deaths).

Percent of accidental deaths

Cause Motor vehicles Falls Drowning Fires Poisoning Other causes

Percent 46 15 4 4 8 23

40

30

20

10

0

Motor Falls Drowning Fires Poison Other

Vehicle

Causes

1.15 Figure 1.10(a) is strongly skewed to the right with a peak at 0; Figure 1.10(b) is somewhat symmetric with a central peak at 4. The peak is at the lowest value in 1.10(a), and at a central value in 1.10(b).

1.16 The distribution is skewed to the right with a single peak. There are no gaps or outliers.

1.17 There are two peaks. Most of the ACT states are located in the upper portion of the distribution, since in such states, only the stronger students take the SAT.

1.18 The distribution is roughly symmetric. There are peaks at .230?.240 and .270?.290. The middle of the distribution is about .270. Ignoring the outlier, the range is about .345 - .185 = .160 (or .350 - .180 = .170).

1.19 Sketches will vary. The distribution of coin years would be left-skewed because newer coins are more common than older coins.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download