ࡱ> +-*_ =bjbjPP 7By:<\y:<\jjD:D!~ooo? A A A A A A $#%e ]oooooe  kkko? ko? kk:{,'k{ +  0!R&^&&(ookoooooe e ^ooo!oooo&oooooooooj> : STAT 518 --- Section 2.1: Basic Inference Basic Definitions Population: The collection of all the individuals of interest. This collection may be _______ or even ____________. Sample: A collection of elements of the population. Suppose our population consists of a finite number (say, N) of elements. Random Sample: A sample of size n from a finite population such that each of the possible samples of size n was Another definition: Random Sample: A sample of size n forming a sequence of Note these definitions are equivalent only if the elements are drawn ________ __________________ from the population. If the population size is very large, whether the sampling was done with or without replacement makes little practical difference. Multivariate Data Sometimes each individual may have more than one variable measured on it. Each observation is then a multivariate random variable (or ____________ ____________ ) Example: If the weight and height of a sample of 8 people are measured, our multivariate data are: If the sample is random, then the components Yi1 and Yi2 might not be independent, but the vectors X1, X2, , X8 will still be independent and identically distributed. That is, knowledge of the value of X1, say, does not alter the probability distribution of X2. Measurement Scales If a variable simply places an individual into one of several (unordered) categories, the variable is measured on a _____________ scale. Examples: If the variable is categorical but the categories have a meaningful ordering, the variable is on the ___________ scale. Examples: If the variable is numerical and the value of zero is arbitrary rather than meaningful, then the variable is on the ______________ scale. Examples: For interval data, the interval (difference) between two values is meaningful, but ratios between two values are not meaningful. If the variable is numerical and there is a meaningful zero, the variable is on the __________ scale. Examples: With ratio measurements, the ratio between two values has meaning. Weaker (------------------------------------( Stronger Most classical parametric methods require the scale of measurement of the data to be interval (or stronger). Some nonparametric methods require ordinal (or stronger) data; others can work for data on any scale. A parameter is a characteristic of a population. Examples: Typically a parameter cannot be calculated from sample data. A statistic is a function of random variables. " Given the data, we can calculate the value of a statistic. Examples of statistics: Order Statistics " The k-th order statistic for a sample X1, X2, & , Xn is denoted X(k) and is the k-th smallest value in the sample. " The values X(1) d" X(2) d" & d" X(n) are called the ordered random sample. Example: If our sample is: 14, 7, 9, 2, 16, 18 then X(3) = Section 2.2: Estimation " Often we use a statistic to estimate some aspect of a population of interest. " A statistic used to estimate is called an estimator. Familiar Examples: The sample mean: The sample variance: The sample standard deviation: These are point estimates (single numbers). An interval estimate (confidence interval) is an interval of numbers that is designed to contain the parameter value. A 95% confidence interval is constructed via a formula that has 0.95 probability (over repeated samples) of containing the true parameter value. Familiar large-sample formula for CI for m: Some Less Familiar Estimators " The cumulative distribution function (c.d.f.) of a random variable is denoted by F(x): F(x) = P(X < x) " This is  EMBED Equation.3  when X is a continuous r.v. Example: If X is a normal variable with mean 100, its c.d.f. F(x) should look like: Sometimes we do not know the distribution of our variable of interest. The empirical distribution function (e.d.f.) is an estimator of the true c.d.f. it can be calculated from the sample data. Example: Suppose heights of adult females have normal distribution with mean 65 inches and standard deviation 2.5 inches. The c.d.f. of this distribution is:  Now suppose we do NOT know the true height distribution. We randomly sample 5 females and measure their heights as: 69.3, 66.3, 62.6, 62.9, 67.4 e.d.f.: The survival function is defined as 1 F(x), which is the probability that the random variable takes a value greater than x. This is useful in reliability/survival analysis, when it is the probability of the item surviving past time x. The Kaplan-Meier estimator (p. 89-91) is a way to estimate the survival function when the survival time is observed for only some of the data values. The Bootstrap The nonparametric bootstrap is a method of estimating characteristics (like expected values and standard errors) of summary statistics. This is especially useful when the true population distribution is unknown. The nonparametric bootstrap is based on the e.d.f. rather than the true (and perhaps unknown) c.d.f. Method: Resample data (randomly select n values from the original sample, with replacement) m times. These bootstrap samples together mimic the population. For each of the m bootstrap samples, calculate the statistic of interest.  *,>?IKe}) * < I ] ^ 5 d e ۬ypppphT`5CJ$aJ$h5CJ$aJ$hT`hZ5>*CJ$aJ$hZhZ5>*CJ$aJ$hZhZ56CJ$aJ$hZ5CJ$aJ$hJ5>*CJ$aJ$h 5>*CJ$OJQJaJ$h?"0h 5CJ$aJ$hJ5CJ$aJ$hX0U5CJ$aJ$h 5CJ$aJ$h15CJ$aJ$'+,>?; <  e z {  $a$gdgdZ$a$gdZ$a$gdT`gd $`a$gd     8 E }  V W Y ^ _ a / 0 1 ɾɵɵɵɵ||||q|hh5CJ$aJ$h5CJ$H*aJ$hh56>*CJ$aJ$hh5CJ$H*aJ$hh56CJ$aJ$hh5>*CJ$aJ$h5CJ$aJ$h5>*CJ$aJ$hh5>*CJ$aJ$hZ5CJ$aJ$hFE5CJ$aJ$hT`5CJ$aJ$hT`hT`5>*CJ$aJ$(  _ ` ! " # $ % & ' 3 4 5 6 7 8 9 : M N gd $a$gd bjn [a NOst]fH>@ֺֺֺֺֺ֫֜֓֓zllh^h^56CJ$aJ$h^5>*CJ$aJ$hh^5>*CJ$aJ$h^5CJ$aJ$ jhMhM5CJ$aJ$ jhMhM5CJ$aJ$hMhM5>*CJ$aJ$hhM5>*CJ$aJ$hM5CJ$aJ$hh5>*CJ$aJ$h5CJ$aJ$h5CJ$aJ$$ ZablmnCDEFXgdMgdgd XYln ^`gd^gd^gdMgd @BFHJTVXprx >hn|"Zj ۾۳۳۳۳}tfttfthethet5>*CJ$aJ$het5CJ$aJ$h"h^5>*CJ$aJ$hh^5>*CJ$aJ$h^h^5>*CJ$aJ$h^h^5CJ$aJ$h^5CJ$H*aJ$h^h^5CJ$H*aJ$h^h^56CJ$H*aJ$h^h^56CJ$aJ$h^5CJ$aJ$h^h^5CJ$H*aJ$(+,?@ABYZ[\}~()gdetgd $a$gd^)J46\^׹װװװ}o`jaU h#CJUVaJjh#5CJ$UaJ$h|fh#5>*CJ$aJ$h#56CJ$aJ$h|fh#5CJ$aJ$h#h#56CJ$aJ$h#5CJ$aJ$h)ih)i5>*CJ$aJ$h)ih)i5CJ$OJQJaJ$h)i5CJ$aJ$hP5CJ$aJ$het5CJ$aJ$hethet5>*CJ$aJ$  JL #$%&'pq'(012$a$gdU!$a$gd#gd#gd)i$a$gd)igd ^`bnp&'qs  !&:K`a"ֳּ֪֪vhh_h65CJ$aJ$hU!hU!56CJ$aJ$h6hU!5>*CJ$aJ$hU!hU!5CJ$aJ$ jhU!hU!5CJ$UaJ$h)i5CJ$aJ$hU!5CJ$aJ$hl 5CJ$aJ$h|fh#5CJ$aJ$h#h#56CJ$aJ$h#5CJ$aJ$jh#5CJ$UaJ$$jh:VKh#5CJ$EHUaJ$$234%&XYwx <><?<u<< & Fgd}gdyQ 7$8$H$gdyQ$a$gd6gd "#0>WX"'89RSlmntvxyz << < <=<><?<@<K<\<k<ҽ~~|~Ҧ~ssh ^5CJ$aJ$UhyQh}5CJ$aJ$h}h}56CJ$aJ$h}hyQ56CJ$aJ$h}5CJ$aJ$h}h}5>*CJ$aJ$hyQhyQ5CJ$aJ$hvQ5CJ$aJ$hyQ5CJ$aJ$h6h65>*CJ$aJ$h65CJ$aJ$h6h656CJ$aJ$- These m values will approximate the sampling distribution. From these bootstrap samples, we can estimate the: expected value of the statistic standard error of the statistic confidence interval of a corresponding parameter Example: We wish to estimate the 85th percentile of the population of BMI measurements of SC high schoolers. We take a random sample of 20 SC high school students and measure their BMI. See code on course web page for bootstrap computations: k<<<<< = = =.=T=V=W=X=====hyQhvQ5CJ$aJ$h}h}5CJ$H*aJ$h{s5CJ$aJ$h}h}5>*CJ$aJ$hvQ5CJ$aJ$h}5CJ$aJ$<<<<V=W====gd} & Fgd}5 01h:p@p/ =!"#$% Dd pebb  c $A? ?3"`?2h"7OCJs-D`!h"7OCJs-(+nxRKP%Q,DP(`;8bcjq'G 鐹{8w/iK}py}r OȲ248+SE>-$MrtkYϏMbyC>ݳxIN5@Ws)K^BJ&!_!j[;v)}ϭzQ ޗ R[Yx+}=OAp)n|ָAS5״_v]ixnrvӫDFhܓKԋRϐV^IԿ u?o?0g1׼ Q[O <ƵgK͕zHyt3C-vDd `'Q'0  # A"l3 q@=l3 q\aUax USL #!4H 0$JzWQkHi(P=+BHr!kkk7ͪg3񩜏eyk}֬eCZEE((QWQ(vaQtJv(:FUP%?Az O6(|EFdfztP?D'G9]]:Z]cCAbJ&Jr(E9<=hgƬJ.i[kBߖC셂E Q2QG֦W|O$=>Y|O?)*='vNt=>C| LriY{<@|{Bxs~^$q=Oc=fԟǟu'7ޔ 8[ ?  jWoL핿oסcCȍ1bӐ۱߄=V3!79ls6Flca;C{uv6>8\HAmn*(\WO'''{Snn;Rm)7S8Tԕ q? .ܚ8U]OC=)?ޑ3PAU .@֒r[voKu7#> 7|ꍅSQn[S>Aȶx\,rA]ysF]y3gA}7|(u-y-PW)FPpuʋPp磞pU/@iG=rخpP*kL!]bk\EYȍ\*kz!]L?Q>] ~5ϐ__ٮrRK%_B||eܮYv͛srvMs Ε;!; sG÷X??| OI^L||兒3૑@)QJZW~GrM[)z)7)N|#|y3ƯByϥf/K| (7~9O?OyϦV9,+?#9mFJJJ'Q@S5c@h9 q($gRS)ə<|˙CyMtwI|;ɷa?ȟ_yʷR({<'H|[ȷyf=~Mxʟ? _ə|=⧑_G~_K3=g{|5x+_%9ǒo9 1<|˙h-g~1$T+e\GQ{W$7|W%7|$7|yHj:|/ "j&|/$>$7NߑOE=Oo? (xCr/.`rOϡxXr?+lr77>W^roܟW^*ď3i3'gg<^!ǿo|:OW=M;ڟF{qO_ ? o`~c~$aI~T:> TXƋ}7eBǫlYcMo~Ⱦto~ɾK=$z<S=ߴf{\O6G$\_|qo=R|h<JV'eo"xs8[8K~n"M7?)o+  !#$%&'()0,/5K1234689:;<=>?@ABCDEFGHIJRoot Entry  F{.@Data "WordDocument 7BObjectPool k{{_1435394446Fk{k{Ole CompObjfObjInfo  FMicrosoft Equation 3.0 DS Equation Equation.39q>k f(t)dt ""x +"Oh+'0  <Equation Native Z1Table7&SummaryInformation( DocumentSummaryInformation8<=N|[c{IxGN8G|w]x7񍏣Zrw?)!ǹ{-q{=O|߸H{A|,LJq'=,LJq=>LU{=&LJqw=>B|+G{H{{S|cI'NjqRcjd܊za{"gmRNQ:2'N'dy(9((|+(6[ /_[m!qDE>mk{[z(_P{`Ruc3Pڣ;V6v]XFnϽX8n~W'`U84S8Xu|,8hʕ+Y}mw"ջS<䮿ii3G&WE7=vG0!j ֠}Ӳ_M+(.ZVm;i9 =`B[G=vF|]\rp31CV,q1}[!/ڄ[=ja1_-أdۂf ie*btd4]6Ȏgjf[D%M 2(]7FXĮ>m֟\vzhoκ꯭ìұf85DDu؞bƯ~iQzD׺/ckGT}}w, 7v;~#l?7lac|͞'*2v=Ƽy=5^ ۱-llÎڥz-`bJ <,(ǢtE~¶/?3ଏcu s+8c;lW<^uDx-L_y}bȺ_ ֵe5- lsads4 iM aMk 2ִ05-}X5m6<[Ƕ䤥5Ț+گkZe״xy6v>4_kZoޚ6״DxNk]vMQk]Ӳо2ִ05mM *ִauQ55uyMߤִ={w\Cvqu8uŝ(QqXn~ p8pe78hhmùGu Go1mչ1ܘߚ´j%ףܰ>n[y΍|΍ ܰ) Fٹa۸scQd癞}q_e\AIj KQ롏td:`//y0GK sVȭoeXolH h t  (STAT 515 --- Chapter 3: ProbabilityHitchcock David B.NormalHitchcock David B.15Microsoft Office Word@]N@jC@V0E~@g fa՜.+,0  hp|   )  %STAT 515 --- Chapter 3: Probability Title  F Microsoft Word 97-2003 Document MSWordDocWord.Document.89qs2 0@P`p2( 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p8XV~ 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0@_HmH nH sH tH @`@  NormalCJ_HaJmH sH tH DA D Default Paragraph FontRiR  Table Normal4 l4a (k (No List FVF 1K'FollowedHyperlink >*B* phjj @p Table Grid7:V0HH ! Balloon TextCJOJQJ^JaJN/!N !Balloon Text CharCJOJQJ^JaJPK![Content_Types].xmlN0EH-J@%ǎǢ|ș$زULTB l,3;rØJB+$G]7O٭VvnB`2ǃ,!"E3p#9GQd; H xuv 0F[,F᚜K sO'3w #vfSVbsؠyX p5veuw 1z@ l,i!b I jZ2|9L$Z15xl.(zm${d:\@'23œln$^-@^i?D&|#td!6lġB"&63yy@t!HjpU*yeXry3~{s:FXI O5Y[Y!}S˪.7bd|n]671. tn/w/+[t6}PsںsL. J;̊iN $AI)t2 Lmx:(}\-i*xQCJuWl'QyI@ھ m2DBAR4 w¢naQ`ԲɁ W=0#xBdT/.3-F>bYL%׭˓KK 6HhfPQ=h)GBms]_Ԡ'CZѨys v@c])h7Jهic?FS.NP$ e&\Ӏ+I "'%QÕ@c![paAV.9Hd<ӮHVX*%A{Yr Aբ pxSL9":3U5U NC(p%u@;[d`4)]t#9M4W=P5*f̰lk<_X-C wT%Ժ}B% Y,] A̠&oʰŨ; \lc`|,bUvPK! ѐ'theme/theme/_rels/themeManager.xml.relsM 0wooӺ&݈Э5 6?$Q ,.aic21h:qm@RN;d`o7gK(M&$R(.1r'JЊT8V"AȻHu}|$b{P8g/]QAsم(#L[PK-![Content_Types].xmlPK-!֧6 0_rels/.relsPK-!kytheme/theme/themeManager.xmlPK-!R%theme/theme/theme1.xmlPK-! ѐ' theme/theme/_rels/themeManager.xml.relsPK] .B @^"k<= X2<= :8@0(  B S  ? 49~W\09OSeiZ_FN e g h l  333333333333$RZ^`#$RZ^`#@9 \?6{$ ^`OJPJQJ^Jo(n^`OJQJ^Jo(hHopp^p`OJQJo(hH@ @ ^@ `OJQJo(hH^`OJQJ^Jo(hHo^`OJQJo(hH^`OJQJo(hH^`OJQJ^Jo(hHoPP^P`OJQJo(hH0^`0o(() ^`hH.  L^ `LhH.  ^ `hH. x^x`hH. HL^H`LhH. ^`hH. ^`hH. L^`LhH.@9?6{        $'~{        DC( l  6!"Y:"0&1K'U4}3=FEJ6jK&cOQyQX0U5Y ^T` bSbEd)i_n{sltU!Ur1M2[R+@p^L}zetSvQf6"+^H # Z" $J7"0#8TP(!@$$$$X p@ppp,@pp4@px@UnknownG*Ax Times New Roman5Symbol3. *Cx ArialeTimes New (W1)Times New Roman;Wingdings9. ")Segoe UI?= *Cx Courier NewA$BCambria Math"qhZ8G8Gfa )fa )!243QHP ? 2!xx $STAT 515 --- Chapter 3: ProbabilityHitchcock David B.Hitchcock David B.  CompObjr