ࡱ> )+(5@ E=bjbj22 />XXx8</<2Pf";;;;;;;$a=R?x;9,,,;;,";,;R8;D '[?N"9;;0/<9+@p+@<;+@;x>,$;;pDpSampling and Data Collection Ideas In a very short time we will be ready to apply material from probability theory in order to make statistical inferences based upon samples. We begin by briefly looking at why and how a sample is made. There are several reasons that make sampling a superior method over census (examining the entire population). These include: Cost: In many situations, the cost of performing a census is so large as to be prohibitive. For example, imagine the cost of having to examine all transactions in a business concern when performing an audit. As another example, consider the cost of a market survey that attempts to take a census of a large customer population. Destructive testing: In many industrial applications, the items sampled may have to be destroyed or made economically worthless by the sampling procedure. For example, the reliability of electronic circuits may be evaluated by measuring the time until they fail. Obviously, it is impossible to take a census of the population of electronic circuits. Speed of analysis: A significant benefit from sampling is that the collected data can be processed and analyzed in a timely fashion. This might be very important in market surveys where customer attitudes can change quickly. Accuracy of analysis: As you will see in our study of frequency distributions, at times it is better to concentrate on a small amount of information than to try to integrate a large amount of information. This is especially true for samples where it may be difficult to accurately collect the data. Thus the individual performing the study may concentrate on a subset of the population and, at time, ask more probing questions, etc. Feasibility: In some cases, it is literally impossible to take a census. One very important example is the case of an infinite population. Types of samples Nonrandom sampling Convenience sampling: select the sample with the ease of sample as the primary consideration. What is wrong (if anything) with taking a telephone survey between 11 a.m. and 1 p.m.? Judgment sampling: select the sample based on past experience with the population. Why survey someone who you already know will not respond? Random sampling Simple random sample: sample of size n where all subsets of size n have the same chance of being selected. This may be accomplished by giving each member of the population the same chance of being selected at each draw. This can be done by assigning each member a number and using a random number table, drawing the numbers from a hat, etc. Systematic random sample: select the members at evenly spaced intervals where the first selection is determined randomly. This is especially convenient in situations where the members are already ordered and numbered, as in a population of invoices. Stratified sample: Used for populations with strata or subgroups. It is frequently possible to improve the information drawn from the sample by sampling randomly from each stratum. Cluster sampling: Select clusters randomly instead of members. A cluster is just a group of members. Examples: city blocks, multiple items packaged together, etc. This technique is attractive because of convenience, cost, time, etc. Types of Data Interval data Values are real numbers. Examples: age, weight, time measurements, etc. All calculations (like taking an average) that we will discuss are valid. Ordinal data Values represent ranked ordering. Examples: preferences, grades (!), etc. Calculations involving orderings are valid; however, research exists that indicates that some preference scales, like a Likert scale, may be converted to interval data. Nominal data Values represent categories. Examples: marital status, gender, etc. Only calculations based on category frequency are valid. Descriptive StatisticsGraphical We will begin our investigation of statistics by studying how we can describe data. For example, the data set provided in the EXCEL file entitled GPS.xls does not efficiently convey information or give insights to the reader about its characteristics. Because the information conveyed is too detailed, there is too much information. You may have already begun to peruse the data, perhaps looking for patterns or common features. In this way you were attempting to reduce the information in the data set into some simple notions. Next we will consider some formal ways to perform this reduction. The Frequency Distribution One may group the observations into carefully selected cells and count the number of occurrences in each cell. This yields a frequency distribution. A frequency distribution either in tabular or graphical form (called a histogram) can quickly convey a broader view of the data set to the reader. The way in which one chooses the number of cells is a matter of artistic taste. Typically it is a good idea to choose the cells to be of equal size, with perhaps the exception of the end cells. Too many cells give too much information, while too few cells lose too much information. An extreme case is a frequency distribution with one cell as shown below.  I think you will agree that this distribution does not contain much information. Sturges has developed the following rule-of-thumb for determining the number of cells to use. Let n = the number of observations in the data set, and k = the number of cells to use in the frequency distribution. Then  EMBED Equation.3 . In our case, k = 1 + 3.3 log10 (100) = 1 + 3.3 (2) = 7.6 H" 8 It is also important to consider the range of the data (largest value  smallest value). If R is the range of the data, we would like to choose k so that R/k is reasonably simple. Why? For our problem, R H" 4  0 = 4 and R/k = 4/8 = 0.5 which makes our analysis easy. Consider what would have happened if we had chosen k = 7. Another representation equivalent to the bar chart is called a line graph. Characteristics of Frequency Distributions Symmetry Skewness Unimodal and Bimodal Bell shape Cumulative Frequency Distributions As an alternative but equivalent way to represent a data set in a frequency distribution is with a cumulative frequency distribution. In this representation, the frequency of a given cell represents not only the frequency of observations in that cell, but also the frequency of observations in all prior cells. Can you get a frequency distribution from a cumulative frequency distribution? Can you think of a situation where a cumulative frequency distribution conveys more information directly to the reader than the frequency distribution? Graphical Representations and Data Type "#p t  /   DFVXYklm):$<"3 2  ɺɲɮɣ͘͟͟͟ hXh>"hX hX5>*hMhXf[hXf[hXf[>* hgXhhXf[h>*hh>" hgX5>*hgXh t h t>* hE>*hEhECJ aJ hE5CJ aJ 8#$%o p     DEFWXYlm^gd & FgdgdgXgd t & FgdEgdE$a$gdX *=D=m()}~78#$ ^gdXf[ & F gdXf[ ^gd ^gd & F gd ^gd gdgdE & Fgd!"AB gd>" & F gd>" gdX gd ^gd ^gdXf[ gdXf[ ^gdXf[ & F gdXf[ ^gdXf[2|}~  34$a$gdX gdE 8^8gd>" & F gd>" & F gd>" gd>" & F gdX23or ,BBCDEFGz{Zd ηΞޚvql hs6 h^ H*jh^ h^ EHUjD h^ CJUVaJjh^ Uh^ hFhF hjhjjhjhjUhj hs5hs hF>* hF6hF hF5>* h;6h;h;hX 6hX hrWCJ aJ hX 5CJ aJ )45BCEFG34st$a$gdjgdX Q R !!$!%!`"a""""J#K#L#M#N# & F gdPsgdX  !!#!$!!!"""""""I#J#L#O#v#w#x#<(=)=*=+=1=2=3=5=6=<===?=@=A=C=D=E=蹵~hh~hh~0JmHnHuhI7G hI7G0JjhI7G0JU h;h>" hT~hT~UhT~h>" hT~5>*hhshs>*h; hs5>* hshshPshPs5>* hPs5>*hPshshshs6-N#O#w#x#)=*=3=4=5=A=B=C=D=E=h]hgdM &`#$gdMgdX While there are no detailed rules to rely on, we can make the following observations about the use of different types of graphs depending on our data. Pie and bar charts make a good deal of sense when using nominal or ordinal data. Histograms and ogives are effective when using interval data. PAGE  PAGE 11 -01hP:pM/ =!"#$%Dd 'PP0  # A"UEAUA ϒn1D@=)EAUA ϒnebT;xY}HdU?9_;:VknkdmBQخ}aNΎn2-5Aa_aKm$ADP`ws73={s;{t!3 i|E>P:our@[k ܍Tp4w#i9>}K}a8NeǗWta܅9| 1%(d>ypHKV,^Dyyny w^l/" R8`/#u/%;V8Zl-n.80DZMvQ8bA*hNLovDY;@rW ' }B>f{-/t@ӻL׍?_{S#?<=w4v'a\O>ɼkY׹nr 6;:N/C@źEqe9fݕu_|o\Ůq_WNkē[1}7!eՑS]!`J! vuoS'(]Z]^l t>kv7b4.{qkά G=@EIat8H8ƹDv`nH317cq30>eƁ4ڴF#>kUM-RbV[kϔrΉq_yc3߂4-Hߢus_\Oꅎ|􈒳K|uzOiAhƿjdCSS`_cg4c CzE'B犓mFA4x3Rܻ7ݚҎ*O2wR|we2sI|L~X;Y@]k?wxOs+7r?@ABCDEFGHIJKLMNOPRoot Entry  Fpᅤ?,Data  WordDocument />ObjectPool '[?pᅤ?_1153141761F'[?'[?Ole CompObjfObjInfo  FMicrosoft Equation 3.0 DS Equation Equation.39q;Ox4n k=1+3.3log 10 (n)Oh+'0| Equation Native k1Table0g@SummaryInformation( DocumentSummaryInformation8 8 8 D P\dlt"Descriptive StatisticsGraphical crescSBABABANormaltSBA2AMicrosoft Word 10.0@F#@:?@:?_՜.+,0 hp|   i1jA "Descriptive StatisticsGraphical Title  FMicrosoft Word Document MSWordDocWord.Document.89q@@@ NormalCJ_HaJmH sH tH DA@D Default Paragraph FontRiR  Table Normal4 l4a (k@(No List4 @4 MFooter  !.)@. M Page Number>#$%opDEFWXYlm()}~7 8 # $ ! " A B  2 | } ~  345BCEFG34st'(uvw67 !"#$%MNwx0000000 00 00 00 00 000000 00 0Y000 0Y000 00 0000 0000 00 00000000 00 0 0 00 00 0}  0}  0} 0 0p0p 0 0 0p0p0p00p0p0p0p0p0p0p0p00p0p00p0p0p000p0p0p00p0p0p0p0p0p0p0p000p00000p0000p 0 0p 0p 0p0p00p0p0p00p0p000p00 0 0 0000@0@0@0@0@0@00ƔBCExO900O90O90O90O90R|@0@0 0dŔ  E=m4N#E=D=z: !!Z/DgZ/DeZ/dZ/TV &8*urn:schemas-microsoft-com:office:smarttagsCity9*urn:schemas-microsoft-com:office:smarttagsplace8*urn:schemas-microsoft-com:office:smarttagstime 01113HourMinutextx3`oFVYk B  } tw9wxxSBA e Hs4z~yxHJ0,x(&z3'/ Hs'B6h-DN!DjEn%Kv,^ Hs6v  88^8`o(hH. ^`hH. pLp^p`LhH. @ @ ^@ `hH. ^`hH. L^`LhH. ^`hH. ^`hH. PLP^P`LhH.hh^h`OJQJ^Jo(hH^`OJQJ^Jo(hHopp^p`OJQJo(hH@ @ ^@ `OJQJo(hH^`OJQJ^Jo(hHo^`OJQJo(hH^`OJQJo(hH^`OJQJ^Jo(hHoPP^P`OJQJo(hH 88^8`o(hH. ^`hH. pLp^p`LhH. @ @ ^@ `hH. ^`hH. L^`LhH. ^`hH. ^`hH. PLP^P`LhH. 88^8`o(hH. ^`o(hH. pLp^p`LhH. @ @ ^@ `hH. ^`hH. L^`LhH. ^`hH. ^`hH. PLP^P`LhH.hh^h`OJQJ^Jo(hH^`OJQJ^Jo(hHopp^p`OJQJo(hH@ @ ^@ `OJQJo(hH^`OJQJ^Jo(hHo^`OJQJo(hH^`OJQJo(hH^`OJQJ^Jo(hHoPP^P`OJQJo(hH 88^8`o(hH. ^`hH. pLp^p`LhH. @ @ ^@ `hH. ^`hH. L^`LhH. ^`hH. ^`hH. PLP^P`LhH.;hh^h`56789<CJH*OJQJS*TXaJo(hH() ^`hH. pLp^p`LhH. @ @ ^@ `hH. ^`hH. L^`LhH. ^`hH. ^`hH. PLP^P`LhH.hh^h`OJQJ^Jo(hH^`OJQJ^Jo(hHopp^p`OJQJo(hH@ @ ^@ `OJQJo(hH^`OJQJ^Jo(hHo^`OJQJo(hH^`OJQJo(hH^`OJQJ^Jo(hHoPP^P`OJQJo(hHhh^h`OJQJ^Jo(hH^`OJQJ^Jo(hH" pp^p`OJQJo(hH@ @ ^@ `OJQJo(hH^`OJQJ^Jo(hHo^`OJQJo(hH^`OJQJo(hH^`OJQJ^Jo(hHoPP^P`OJQJo(hH 88^8`o(hH. ^`o(hH. pLp^p`LhH. @ @ ^@ `hH. ^`hH. L^`LhH. ^`hH. ^`hH. PLP^P`LhH. 88^8`o(hH. ^`hH. pLp^p`LhH. @ @ ^@ `hH. ^`hH. L^`LhH. ^`hH. ^`hH. PLP^P`LhH.hh^h`OJQJ^Jo(hH^`OJQJ^Jo(hHopp^p`OJQJo(hH@ @ ^@ `OJQJo(hH^`OJQJ^Jo(hHo^`OJQJo(hH^`OJQJo(hH^`OJQJ^Jo(hHoPP^P`OJQJo(hH 'B%K3'/ev,^HJ6v~yx(&4!DjE0-D         &e        *&;         *         &el_               &e        j>"1;I7GrWgX\)YXf[jPs tT~F^ MX sEpXh~@GGL7;GG4N@@@x@UnknownGz Times New Roman5Symbol3& z Arial?5 z Courier New;Wingdings"1h++_1_1!4djj2QH ?j!Descriptive Statistics Graphical SBASBA<         CompObjj