ࡱ> y{x5@ %5bjbj22 (vXX! ....<8/;00000 3 3 3Y;[;[;[;[;[;[;$=R*@x;6s2 366;00k;]7]7]7600Y;]76Y;]7]7u7`@u70/ T`.6"u77;0;u7@6^@u7@u7@ 33]7O4l4 3 3 3;;x$ A7x$SELECTING SAMPLES (Day 2) Example 2-1: Sampling Words  (a) Circle 10 representative words in the above passage. The authorship of several literary works is often a topic for debate. Were some of the works attributed to William Shakespeare actually written by Francis Bacon or Christopher Marlowe? Which of the anonymously published Federalist Papers were written by Alexander Hamilton, which by James Madison, which by John Jay? Who were the authors of the writings contained in the Bible? The field of literary computing began to find ways of numerically analyzing authors works, looking at variables such as sentence length and rates of occurrence of specific words. The above passage is, of course, Lincolns Gettysburg Address, given November 19, 1863 on the battlefield near Gettysburg, PA. In characterizing this passage, we could have asked you to examine each word. Instead, we asked you to look at a subset of the words of the passage. We are considering this passage a population of words, and the 10 words you selected are considered a sample from this population. In most studies, we do not have access to the entire population and can only consider results for a sample from that population. The goal is to learn something about a very large population (e.g., all American adults, all American registered voters) by studying a sample. The key is in carefully selecting the sample so that the results in the sample are representative of the larger population. The population is the entire collection of observational units that we are interested in examining. A sample is a subset of observational units from the population. Keep in mind that these are objects or people, and then we need to determine what variable we want to measure about these entities. (b) Consider the following variables: length of word whether or not word length > 4 characters Classify each of these variables as quantitative or categorical. (c) Record the data from your sample for the above variables: word12345678910lengthlong? (d) Do you think the words you selected are representative of the 268 words in this passage? (e) How many long words (which we will define for now as having more than 4 characters) were there in your sample? What proportion of your sample consisted of long words? Reminder: Note the variable here is whether or not the word is long, not how many words are long. We will soon have a different term for this numerical result. (f) There are 99 long words in the population. What proportion of the population is long words? (g) Did your sample proportion exceed the population proportion? How many students in your class exceeded this proportion? (h) If we were to repeat this exercise in different classes, do you think we would see similar results? Explain. (i) Explain why this sampling method (asking people to choose five words at random) is biased and how this bias is exhibited. Also identify the direction of the bias. In other words, does the sampling method tend to overestimate or underestimate the proportion of long words? Key Idea: Use information collected on sample to gain information about entire population. Want sample to be representative (have the same characteristics as) the population of interest. In order to avoid biased samples, we need to make sure the sampling method is equitable and that there is a known probability for each member of the population to be selected. While the principle of simple random sampling is probably clear, it is by no means simple to implement. One approach is to use a computer-generated table of random digits. Such a table is constructed so that each position is equally likely to be occupied by any one of the digits 0-9, and so that the value of any one position has no impact on the value of any other position. A table of random digits can be found in the back of the book (Table B). The first column in the random number table gives you a row number for you to refer to. It is often convenient to read across a line, but you can begin anywhere on a line and move in any direction. If you need more digits, just continue to the next line. Example: Suppose we have a class of 55 students. Use Line 140 to select 6 two-digit numbers between 01 and 55: Line 140 12975 13258 13048 45144 . We would read off the following numbers 12, 97, 51, 32, 58, 13, 04, 84, 51, 44 But would have to toss out 97, 58, 84 since they dont match up with students in the class. We would also toss out the second occurrence of 51. This leaves us with students 12, 51, 32, 13, 04, 44 Example: Sampling Words (cont.) (a) The first step is to obtain a sampling frame where each member of the population can be assigned a number. Examples include vehicle registration lists, phone books, voter registration lists, Cal Poly student registrars list. Here we just need to number the words in the above passage. Open the webpage: statweb.calpoly.edu/chance/stat217/address.html (b) Use the table of random digits to select a simple random sample of 5 words from the Gettysburg address. Do this by entering the table at any point (it does not have to be at the beginning of a line) and reading off three-digit numbers between 001 and 268. (Disregard any numbers not in this range. If you happen to get repeats, keep going until you have five different two-digit numbers.) Continue until you have 5 numbers corresponding to words in this population. Report the words you selected and determine whether or not they are long. Line Number Used: Three digit numbers: 12345wordlengthlong? While we dont expect to match the population proportion exactly, we should see that we err equally on each side instead of systematically overestimating the population proportion. (c) How many students in your class obtained a sample proportion that was larger than the population proportion? (d) To really examine the long-term patterns of this sampling method, we will use technology to take many, many samples. From the course webpage, follow the links for JAVA applets, and select the Sampling Words applet (http://statweb.calpoly.edu/chance/applets/applets.html). The information in the top right panel tells you the number of letters per word in the population, the proportion of long words and the proportion of short words (defined as having less than 4 letters). Click off the Shows Words and Show Noun boxes so only the long vs. not long graph is displayed. Specify 5 as the sample size and click Draw samples. The applet randomly selects 5 words, just as you did above, and reports the sample in the top left window. Record the words and whether or not they were long for this sample: 12345wordlengthlong?(e) Click Draw Samples again, did you obtain the same sample of words this time? (f) Change the Number of samples (Num samples) from 1 to 100. Click the Draw Samples button. The applet now takes 100 different simple random samples from the population. Key observation: There is variability in the results from sample to sample. The applet adds each sample proportion to the graph in the lower right panel. The red arrow reports the average value of these 100 sample proportions. Record this value below. Average of 100 sample proportions: (g) If the sampling method is unbiased the sample proportions should be centered around the population proportion of .37 (denoted by the grey vertical line). Does this appear to be the case?  (h) Change the sample size from 5 to 10. Click off the Animate button and click on Draw Samples. Average of 100 sample proportions (with sample size 10): Produce a rough sketch of the distribution of these different proportions  How this distribution (in black) compares to the previous (in green): (i) Does the sampling method still appear to be unbiased? What has changed about the type of sample proportions that we obtain? Why does this make sense? (j) Click the Reset button. Lower in the page you will see a menu the currently says address. Pull down the menu and select four addresses. Now your population consists of 4 copies of the Gettysburg Address (4x268 = 1072 words) so that it is four times larger than it used to be (but the population characteristics are the same). Click the Draw Samples button. How does this distribution compare to the one you sketched in the previous question?  Four score and seven years ago, our fathers brought forth upon this continent a new nation: conceived in liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battlefield of that war. We have come to dedicate a portion of that field as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this. But, in a larger sense, we cannot dedicate, we cannot consecrate, we cannot hallow this ground. The brave men, living and dead, who struggled here have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember, what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us, that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion, that we here highly resolve that these dead shall not have died in vain, that this nation, under God, shall have a new birth of freedom, and that government of the people, by the people, for the people, shall not perish from the earth. A simple random sample gives every observational unit in the population the same chance of being selected. In fact, it gives every sample of size n the same chance of being selected. So any set of 10 words are just as likely to end up in our sample. We often abbreviate this method as SRS. When a simple random sample is used, we are allowed to generalize results from our sample to the larger population. While we expect some variabilty in our results, there is a predictable pattern to the variation. On the other hand, if the sampling method is biased, we can make no claims about the population. In this example, we were able to compare to the population, but that is not usually the case. Thus, it is very important to determine whether or not the sample was selected at random before we can believe that the sample results are representative of the population. Once we have a representative sample, we can improve the precision by increasing the sample size. With larger random samples, the results will tend to fall even closer to the population results. A rather counter intuitive, but very crucial, fact is when determining how representative your sample is, and how close your sample results should be to the population result, the size of the population does not matter! This is why organizations like Gallup can state poll results about the entire country based on samples of just 1-2,000 respondents. As long as those respondents are randomly selected. 789=t . 4 DJ   /067;LUֽoo_oo_______h>GB*aJmHnHphu,h>G5B*OJQJ\aJmHnHphuh>GOJQJmHnHu&h>GB*OJQJaJmHnHphu*h>G0JB*OJQJaJmHnHphu0h>G0J5B*OJQJ\aJmHnHphu9jh>G5B*CJOJQJU\aJmHnHphuhb|rmHnHuhb|r%79stuvwx   h8 xH$Ifgdg $ Na$gd>G$  h8 xHNa$gd>G$ ^`a$gd>G$ hh^ha$gd>G$a$gdb|r)$5  1Drl h$If^`gdg  h8 xHNgd>G 8Jx N@ ^gd>G hh^hgd>G $ Na$gd>G2kd$$IfTaT     h$If^`gdgFfC$ h$If^`a$gdg%&'()*+,-./067 hh^hgd>G $ Na$gd>G  h8 xHNgd>GFfu h$If^`gdgFf\L[\]P h^h`gd>G hh^hgd>G$ & Np@ P a$gd>G$ h^ha$gd>G$ hh^ha$gd>G $ Na$gd>GU_g[]aPSWklnxꫛnZD7h>G5CJOJQJ\*h>GB*CJOJQJaJmHnHphu&h>GB*OJQJaJmHnHphu,h>G6B*OJQJ]aJmHnHphu"h>GB*CJaJmHnHphuh>Gh>GB*aJmHnHphu&h>GB*OJQJaJmHnHphu"h>G0JB*aJmHnHphu0h>G0J6B*OJQJ]aJmHnHphu*h>G0JB*OJQJaJmHnHphuPQRSklmn*+o$  h8 xHNa$gd>G$a$gd>G^gd>G:$ \p@ P !$`'0*-/2p5@8;=@CPF IKNQ`Tda$gd>G$ hh^ha$gd>G$ & Np@ P a$gd>G x*+,;QrvİweO6O0h>G0J5B*OJQJ\aJmHnHphu*h>G0JB*OJQJaJmHnHphu"h>GB*^JaJmHnHphu*h>GB*OJQJ^JaJmHnHphu$h>G5B*\aJmHnHphuh>GB*aJmHnHphu&h>GB*OJQJaJmHnHphu+jh>GB*CJUaJmHnHphu h>G5\ h>G6CJOJPJQJ]^Jh>GCJOJPJQJ^J;QRr$ hh^ha$gd>G$ h^`a$gd>G$ hh^ha$gd>G Ngd>G dgd>G)$ :h8 xHX (#%(+h.81469xG67ABNOZ["""""""""D#F#I#J##$%տը՘՘՘՘տm՘՘՘՘՘ը"h>G0JB*aJmHnHphu0h>G0J5B*OJQJ\aJmHnHphuh>GB*aJmHnHphu,h>G6B*OJQJ]aJmHnHphu*h>G0JB*OJQJaJmHnHphu&h>GB*OJQJaJmHnHphu,h>G5B*OJQJ\aJmHnHphu#+,.0246$ h$If^`a$gdg$ hh^ha$gd>G 67<=>E+++$ h$If^`a$gdgkd $$Iflֈ ,"064 lah>?@AB+kdR $$Iflֈ ,"064 lah$ h$If^`a$gdgBIJKLMN$ h$If^`a$gdgNOUVWE+++$ h$If^`a$gdgkd $$Iflֈ ,"064 lahWXYZ[+kd $$Iflֈ ,"064 lah$ h$If^`a$gdg[\ r!!dG$ 8Jx N@ ^a$gd>G7 \p@ P !$`'0*-/2p5@8;=@CPF IKNQ`Tdgd>G & Np@ P gd>G$ h^ha$gd>G)$ :h8 xHX (#%(+h.81469xG$ hh^ha$gd>G!"|"}"~"""""""""$ h$If^`a$gdg$ h$If^`a$gdg&$ : h8 xH Np@ P a$gd>G$ 8Jx N@ ^a$gd>G """""E+++$ h$If^`a$gdgkdV $$Iflֈ ,"064 lah"""""+kd $$Iflֈ ,"064 lah$ h$If^`a$gdg"""""""$ h$If^`a$gdg"""""E+++$ h$If^`a$gdgkd $$Iflֈ ,"064 lah"""""+kdZ$$Iflֈ ,"064 lah$ h$If^`a$gdg"D#E#F#$$%% %%%%%%%J&K&&$ h^`a$gd>G&$ : h8 xH Np@ P a$gd>G & Np@ P gd>G$ hh^ha$gd>G%%%#%$%%%%%%%%J&K&&&&&&&&kցցZI>h>G]mHnHu!h>GB*]aJmHnHphu!h>G6B*aJmHnHphu+jh>GB*CJUaJmHnHphuh>GB*aJmHnHphuh>GmHnHuh>GOJQJmHnHu0h>G0J5B*OJQJ\aJmHnHphu*h>G0JB*OJQJaJmHnHphu&h>GB*OJQJaJmHnHphu)h>G>*B*OJQJaJmHnHphu&&&"'''''''''))))))))) dgd>G)$ :h8 xHX (#%(+h.81469xG & Np@ P gd>G&&&!'"''')))))V/W/Y/[/o//ȴtpZHD1$h>G5B*\aJmHnHphuh>G"h>GB*CJaJmHnHphu*h>GB*CJOJQJaJmHnHphuhVi+jh>GB*CJUaJmHnHphuh>GB*aJmHnHphu3jh>GB*CJOJQJUaJmHnHphu&h>GB*OJQJaJmHnHphu'h 1h>GB*]aJmHnHphuh>GmHnHuh>G]mHnHujh>GU]mHnHu))))))O*P*++++--V/W/X/Y/00gd>G$ & Np@ P a$gd>G+ : h8 xH Np@ P ]^gd>G dgd>G//z0}0~00 1122233@4f4#5$5%5ƿƻhVihg h>G6]h>G$h>G5B*\aJmHnHphuh>GB*aJmHnHphu$h>G6B*]aJmHnHphu0V12233"5#5$5%5gd>G &1h:pol/ =!"#$%Y$$If!vh5#v:V 5/ 4T$$Ifh!v h5055555555 5 5 #v0#v#v :V l0655 4ah/kd[$$Ifl , eI-!06,,,,4 lah$$Ifh!v h5055555555 5 5 #v0#v#v :V l0655 4ah/kdt$$Ifl , eI-!06,,,,4 lah$$Ifh!v h5055555555 5 5 #v0#v#v :V l0655 4ah/kd$$Ifl , eI-!06,,,,4 lah$$Ifh!vh555555#v:V l0654ah$$Ifh!vh555555#v:V l0654ah$$Ifh!vh555555#v:V l0654ah$$Ifh!vh555555#v:V l0654ah$$Ifh!vh555555#v:V l0654ah$$Ifh!vh555555#v:V l0654ah$$Ifh!vh555555#v:V l0654ah$$Ifh!vh555555#v:V l0654ahDd EE0  # AbŸiG1iAgjkJ ncŸiG1iAgjPNG  IHDR__sRGB pHYs+IDATx^n E>^Bsғu;Yuof%w@r@r:@m$́eYVfz6鲍2mr{H[M7{sV(*Kys;A.+N粸nG)/bZj_tm^D.nEkÚWӖ +YFHe "7sH9hke;^G.3&׬cW4m,8,\3RO&ił/EM䥳M}Xc^l 2xr ]!ȍe5Kr/I rs~'xi/Ǵ-_RGHzoHNrT͜Pu9Q6s:@m$GuIUn3'9TfNrT͜Pu9Q6s:@m$GuIUn3'9TfNrT͜Pu9Q6s:@m$GuIUn3'9TfNrT͜Pu9Q6s:@m$GuIUn3'9TfNrT͜Pu9Q6s:@m$GuIUn3'9TfNrT͜Pu9Q6s:@m$GuIU8r˲L'mhܴae]׽ǝt;Uqx xP 2vnde%7A$7˞zW#m=GNormalCJ_HaJmH sH tH @`@ >G Heading 3$$@&a$5\DA@D Default Paragraph FontRi@R  Table Normal4 l4a (k@(No ListDZ`D >G Plain TextCJOJQJ^JaJfof >G ActivityList,$ 1$7$8$H$^`a$OJQJror >G ActivityText7$ h8 xH1$7$8$H$a$OJQJxo"x >GzzzActivityNextItem/$ 1$7$8$H$^`a$OJQJ o1 >GBold`B >G Comment Textm$ Vh8 xHX (#%(+h.81469xG bulletlist25 8JxJb1$7$8$H$^J`bOJQJtP`bt>G Body Text 2+ & Np@ P B*aJmHnHphu, %-, %- v79stuvwx  1Dr   %&'()*+,-./067L [ \ ] P Q R S k l m n * + ;QRr+,.02467<=>?@ABIJKLMNOUVWXYZ[\r|}~DEF JK"!!!!!!!!!!!!!!O"P"####%%V'W'X'Y'((V)**++"-#-&-(0(0(00p0p00000000@0@0 @0 @0@0@0@0@0@0@0@0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0 @0@0@0@0@0@0@0@0@0@0@0@0@0@0@0@000000000000000000000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000000000000000000000000000000000000000000@0000000000@00@000@00@00@007wx r   +.02467<=>?@ABIJKLMNO|}!!!!!!!&-O90M90My00Oy00 Oy00Oy00Oy00Oy00Oy00 Oy00Oy00 Oy00Oy00Oy00Oy0 0 Oy0 0 Oy0 0 Oy00 Oy0 0 Oy00 Oy00 Oy00 Oy00 Oy00 Oy00 Oy00 Oy00 Oy00 Oy00 Oy00Oy00Oy00Oy00 Oy00 Oy00 Oy00 Oy00 Oy00 Oy0#0 Oy0#0 Oy00 Oy0#0 Oy0#0 Oy0#0 Oy00 Oy0*0 Oy0*0 Oy00 Oy0*0 Oy0*0 Oy0*0 Oy00 Oy010 Oy010Oy010Oy00Oy050 Oy050Oy050Oy00Oy050Oy00 Oy0;0Oy0;0Oy00Oy0;0Oy0;0Oy0;0Oy00 Oy0B0 Oy0B0 Oy00 Oy0B0 Oy0B0 Oy0B0 Oy00 Oy00 Oy00Oy00UOy00Oy00Oy00 0,M4 QUx%&/%5"$&579P6>BNW[!""""""&)0%5 !#%'()*+,-./0123468:$58@"(  H  #  H  #  H  #  H  #  H  #  B S  ?7+ !%- Q 4LX 4"| 4h`4H4 j,<k,14lD[m4nV=op} qrstī u0vt   ,44 !,,&-    *//>> !,,&- 8*urn:schemas-microsoft-com:office:smarttagstime9*urn:schemas-microsoft-com:office:smarttagsState8 *urn:schemas-microsoft-com:office:smarttagsdate9 *urn:schemas-microsoft-com:office:smarttagsplace8 *urn:schemas-microsoft-com:office:smarttagsCity  111618631920DayHourMinuteMonthYear     !&-!,!-&-37,\!W(~(,!-&-!&-default  >GPTgiTgVib|r$teEol    %&'()*+,-./0Rr+,.02467<=>?@ABIJKLMNOUVWXYZ[}!!!Y'(*+#-&-MB@ %-@UnknownGz Times New Roman5Symbol3& z Arial?"Futura BookG5  hMS Mincho-3 fg7Century?5 z Courier NewGNew Baskerville"1hrbxFsbxF<<!x4!!2QX ?PTExample 1: Sampling WordsdefaultdefaultOh+'0   < H T`hpxExample 1: Sampling Words.xamdefaultefaefa Normal.dotSdefault2faMicrosoft Word 10.0@F#@`@v`՜.+,0 hp  CAL POLY STATE UNIVERSITYn<!A Example 1: Sampling Words Title  !"#$%&'()*+,-./0123456789:;=>?@ABCDEFHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgijklmnoqrstuvwzRoot Entry Fd`|Data <1TableG@WordDocument(vSummaryInformation(hDocumentSummaryInformation8pCompObjj  FMicrosoft Word Document MSWordDocWord.Document.89q