ࡱ> VXU Xbjbjzpzp *^q%98trL$$t((w#y#y#y#y#y#y#,%J(r#9##w#w#r!T#"@U!c##0$$!2((#"#"@##$$( x:    USE OF WEIGHTS FOR SURVEY DATA (D-Lab Workshop) INTRODUCTION Total error = (Sampling error) + Bias = (Loss of PRECISION) + Bias Reason for weighting: data may need adjustment to correct bias Main types of weights Compensate for different probabilities of selection Nonresponse adjustments Post-stratification adjustments 1A. DIFFERENT PROBABILITIES OF SELECTION -- BY DESIGN Stratified sampling (by region, province, etc.) Select separate sample in each stratum Different sampling fraction for many possible reasons (if same sampling fraction: stratify only to ensure coverage) Want extra cases in some strata (the usual situation) Want enough cases for separate estimates by region Plan to do comparisons -- want equal numbers in strata (optimal for comparisons, for equal S and cost) Optimum allocation of the sample (not very common) -- f = kS / sqrt(cost) Higher sampling fraction (f) in strata with higher variance Stratified variance = weighted sum of variances in the strata Make f (sampling fraction) proportional to S (standard deviation) of the target variable Higher f in strata with lower cost More data for fixed amount of money f inversely proportional to the square root of the cost Whatever the motivation, we need to weight in order to combine data from strata that were sampled at different rates Usual Method: Case weights Apply a weight to each case (inverse to the sampling fraction) Virtually all statistical packages allow for a weight variable. 1B. DIFFERENT PROBABILITIES OF SELECTION -- AFTER THE FACT Probabilities unknown until the time of the interview Number of families in the housing unit, if only one is selected Weight factor = number of families in this housing unit Number of eligible persons in the family, when only one person is selected from each family Person living alone is certain to be selected Person with 3 others has only 1/4 chance to be selected Weight factor = number of eligible persons Number of telephone LINES into the household Weight factor = 1 / (number of telephone lines) WORKSHEET 2. NONRESPONSE ADJUSTMENTS Assumption if no adjustment: All nonresponders are like the average respondent (not a realistic assumption) Key strategy: Divide up the population into several categories Assume that nonrespondents in each category are (relatively) like the respondents in the same category Weight the respondents to compensate for nonrespondents Common categories for adjustment Strata used for sampling purposes Region, size of city, etc. Time periods: month, day of week Demographic categories, IF KNOWN at the time of selection Male/female, education, or occupation Weight factor = 1 / (response rate for members of each category) Could also do a special nonresponse study Spend extra to interview a subsample of nonresponders Weight them to represent all the nonresponders Rarely done, because of the cost ITEM nonresponse is a separate problem Various techniques: imputation OR exclude cases with missing data 3. POST-STRATIFICATION ADJUSTMENTS Purpose: adjust for noncoverage (and perhaps also for nonresponse) Main idea is the same as for other adjustments Divide up the sample into several categories e.g., classifications by sex, size of city, region make sure each category has at least about 20 cases For each category get two distributions of respondents: 1) Percent (to 3 or 4 decimals) of the respondents to the survey (weighted) 2) Some external criterion (usually, recent census data) Adjustment = percent(criterion) / percent(survey) for each category Notes: You can use total Ns instead of percents, if you wish same result. For more weighting variables/categories, can use raking of marginals. For stratified samples, post-strata should ideally be formed WITHIN the design strata, but usually this is not done because the strata do not have enough cases. 4. HOW TO DO THE WEIGHTING First adjust for different probabilities of selection Multiply all factors (designed or after the fact) Scale the weights so that sum of weights = sum of cases (wi = n) (usually a relative weight is the best, although expansion weights are common) Keep this weight distinct as a basic sampling weight Then adjust for differential nonresponse, if necessary Multiply this adjustment by the sampling weight This weight will include adjustments for probability of selection, as well as for nonresponse Then do post-stratification adjustments Use the preceding weight when generating the distribution of survey respondents into the specified categories Multiply the post-stratification adjustment by the preceding weight for each category of respondents This final weight will include the preceding adjustments as well. Scale again, if necessary, to the desired sum of weights. Final adjustments to the weights Problem: If there are a few cases with extreme weight values, those few cases could seriously bias the results. This could happen with some cases from areas selected with low probability and/or low response rates and/or low coverage rates. In such situations, you might end up with estimates that depend heavily on those few cases that just happened to be included in the sample. And if the sample were replicated, and other cases were selected, the estimates might be very different. Solution: If there are a few cases with extreme weight values, it is a good idea to trim the weight or the components of the weight (like number of persons in a HH). To do this, you get a distribution of all the weight values and then (for example) change the values of the upper (and lower) 1% to be equal to the next highest (or lowest) value. More elaborate schemes are sometimes applied. Note also that Census PUMS files use topcoding for variables like income: above a specified limit, the cases are assigned the statewide mean or median of the cases with values above that limit. This is done so that a few extreme values do not exaggerate the mean and variance of those variables. 5. LOSS OF PRECISION BECAUSE OF WEIGHTING Criterion: simple random sample of size n (spread proportionately over all categories of respondents) Sometimes weighted estimates have smaller sampling variances Result of optimal allocation oversampling high-variance strata (rare) Usually, however, weighting compensates for allocations of the sample done for other reasons Often done just to get more cases in certain strata The resulting weights are sometimes called random weights Effect of weighting on precision of estimates depends on: Correlation of weight variable with Y (different for every variable) Variability of the weight variable (easier to look at) Full analysis of the effect of weighting usually requires special computer programs for variance estimation However, we can estimate the expected loss in precision due to a specific sampling plan (applies to means and percentages) BEFORE (or after) data collection: For stratum aggregates: WORKSHEET DEFF = SYMBOL 83 \f "Symbol" \s 11S (Wh * kh) * SYMBOL 83 \f "Symbol" \s 11S (Wh / kh) Wh = stratum population weight kh = relative sampling fraction for each stratum VERY USEFUL for assessing in advance the effects of various rates of oversampling SPREADSHEET DEFF = increase in the sampling variance DEFT = sqrt(DEFF) = increase in the standard error AFTER data collection: From the data file containing caseweights Coefficient of variation (CV) is the standard deviation divided by the mean CV of the weight variable = Stdev(wtvar) / Mean(wtvar) CV2 = Var(wtvar) / Mean(wtvar)2 DEFF = 1 + CV2 Special case, if the weight is a relative weight, such that the sum of the weighted cases equals the actual n of cases: Since the mean of such a weight variable = 1.0, DEFF = 1 + Var(wtvar) These formulas apply strictly only to random weighting of a SRS, but they provide useful estimates for other designs as well. How big are such design effects? DEFFS from Health Surveys 6. USING WEIGHTS TO SHIFT THE UNIT OF ANALYSIS HANDOUT When sampling groups, are you interested in the groups or the components? In a sample of firms, do you want to estimate characteristics of the firms or of the workers? Weights can shift the unit of analysis between the two. But you should have a clear idea of what you want to estimate. The most efficient estimate (smallest standard error) will be the unweighted estimate. Suggested Readings Robert M. Groves, et al., Survey Methodology, 2nd edition, Hoboken, NJ: John Wiley and Sons, 2009. [Best current summary of survey methodology; includes sections on sampling and weighting] See especially pp. 347-354 on weighting. Leslie Kish, Survey Sampling. New York: John Wiley and Sons, 1965, 1995. [Comprehensive work on sampling, with many examples and illustrations; a basic reference for survey samplers] See especially pp. 424-430 on loss of precision due to weighting. Vijay Verma and Thanh Le, An Analysis of Sampling Errors for the Demographic and Health Surveys,  013?@ABH T V $ / i 6 ȽȳzoeZeh'5CJOJQJh'CJOJQJh&c5CJOJQJhI9CJOJQJhI95CJOJQJh-5CJOJQJhT3Eh345CJOJQJhT3E5CJOJQJhT3ECJOJQJh}( 5CJOJQJhMlCJOJQJhhCJOJQJhCJOJQJhMl5CJOJQJh-35CJOJQJ  123@Ah 5 U V $ h i  B gd' & Fgd&c$a$6 8 C v   B C x z  A G w ǽǨǚǚnj|naVLVhCJOJQJh5CJOJQJh5;CJOJQJhhMl5CJOJQJh-hMl56CJOJQJh-hMl6CJOJQJhhMl6CJOJQJh-CJOJQJhQV6CJOJQJhQVCJOJQJhMlCJOJQJh'5CJOJQJhMl5CJOJQJh'CJOJQJhQVh'6CJOJQJB C  > p A x y !W{ p^p`gdT3EgdT3Egdgd'w x y {  !`HwxyƻƱܱܧܒ|rrrgѱYhT3EhT3E5CJOJQJhY5CJOJQJhMlCJOJQJhMl5CJOJQJh&c5CJOJQJh#05CJOJQJhDJ-CJOJQJh8aCJOJQJhT3ECJOJQJh8a5CJOJQJh5CJOJQJhT3E5CJOJQJhCJOJQJh5CJOJQJhT3Eh5CJOJQJ E$W"Ge45`$W"345`XZTUɴɴɟɔɉ~qh#0hCJOJQJhDJ-5CJOJQJh#05CJOJQJhO55CJOJQJh5CJOJQJhYCJOJQJhY5CJOJQJhdCJOJQJhMl5CJOJQJh5CJOJQJhDJ-CJOJQJhMlCJOJQJhT3EhMlCJOJQJ,Y#YTU8 |p Cgd#0)-5678  |Z\󿲨~shZO~EhYCJOJQJhMl5CJOJQJh^[hMl5CJOJQJh^[5CJOJQJh5CJOJQJhMlCJOJQJh#05CJOJQJh[5CJOJQJhk*CJOJQJh#0CJOJQJh#0h CJOJQJh#0h[CJOJQJh#0hCJOJQJh#0hDJ-CJOJQJh#0hCJOJQJh#0h#0CJOJQJ\bdntv  CD}>CL`aݺݯݥݛݯݥ݆|q|||gqhCJOJQJh5CJOJQJh#0CJOJQJhU&xCJOJQJh#05CJOJQJh%CJOJQJhCJOJQJhMl5CJOJQJhhMl5CJOJQJh6CJOJQJhCJOJQJhMlCJOJQJhY5CJH*OJQJhY5CJOJQJ(CD}>~bR S !!!!"gd#0 "^`"gd#0gd*kn! " # Q c !!!!m!!!!!!!칯ככבבבבzpphMlCJOJQJh#0hU&xCJOJQJhCJOJQJhg8CJOJQJh^[CJOJQJhbCJOJQJh%CJOJQJhQ}CJOJQJhCJOJQJh5"7CJOJQJh#0CJOJQJhU&x5CJOJQJhU&xCJOJQJhCJOJQJ,!!!!!!!"("h"i"{""""""###D#j#q####>$f$$$$$$$%*%5%7%8%]%ᴪև|rgh%5CJOJQJh%CJOJQJh ;5CJOJQJho5CJOJQJhCJOJQJh%hMl5CJOJQJht'CJOJQJhoCJOJQJhT3ECJOJQJha-2hMl5CJOJQJhMl5CJOJQJhMlCJOJQJh5CJOJQJhQ}CJOJQJ'"""#D####$<$=$$$$$$$$$%7%8%]%%%&=& ^gd%"]"gd%gd%]%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% & &&=&&&&&ԵԵԵԵԵԵԫ~ph1h345CJOJQJh345CJOJQJ"hkh%5CJH*OJQJaJhkh%5CJOJQJaJh%CJOJQJhkh%5CJH*aJ jhkh%5CJUaJhkh%5CJaJh%5CJaJh%5CJOJQJh15CJOJQJ'=&v&&&&''('V'''(((b(((((/)o)p))))))gdqgd`]`gd1gd%& ''''+'V's'''''''''''((((((>(ιuh]Oh&m2h&m25CJOJQJh&m25CJOJQJh5CJH*OJQJhy&h ;CJOJQJhy&CJH*OJQJh ;CJH*OJQJhCJOJQJh ;h ;CJOJQJhT3ECJOJQJh ;CJOJQJht'5CJOJQJh5CJOJQJhMlCJOJQJh1h345CJOJQJh1h15CJOJQJ>(M(N(((((((((((6);)o)))))))))))⸮wmbTIbh5"75CJOJQJh5"7h5"75CJOJQJh15CJOJQJhCJOJQJh1CJOJQJh5CJOJQJht/|ht/|5CJOJQJh34CJOJQJht/|CJOJQJhgCJOJQJhMlCJOJQJhy&5CJOJQJh&m26CJOJQJh&m2CJOJQJh&m2h&m25CJOJQJh&m2h&m256CJOJQJ))@****$+|+}++++\,,,,H--->X}X~XXXXXX 0^`0gddGgdgdq))+*,*?*@*****#+$+{+|+}+~+++++++++++ȾȣәvhvZPBPh{AnhCJH*OJQJhCJOJQJh{Anh6CJOJQJh>Zh5CJOJQJhX hCJOJQJhq5CJOJQJh5CJOJQJh PCJOJQJh| _h5CJOJQJhhCJOJQJhCJOJQJh5CJOJQJhCJOJQJh1CJOJQJh| _CJOJQJh1h15CJOJQJ+++++++\,,,,,,,,,H------X XXXXXXXXXXXXXXXXը˚˘˅wqwfwqh10JmHnHu h10Jjh10JUhjhUhdG6CJOJQJUhdGhdG5CJOJQJh}( CJOJQJh6CJOJQJh>Zh5CJOJQJhdGCJOJQJhV%CJOJQJh8AlCJOJQJhCJOJQJhX hCJOJQJ&International Statistical Review, vol. 64, 1996, pp. 265-294. [Source of the tables on design effects in health surveys]     PAGE 1  PAGE \* MERGEFORMAT 1 XXXXXXXXXXXXXX$a$&`#$ XXXXXXXXXXhdGCJOJQJhhdGmHnHujh1Uh1 (/ =!'"'#$% 02 0@P`p2( 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p 0@P`p8XV~8XV~_HmH nH sH tH 8`8 Normal_HmH sH tH DA`D Default Paragraph FontViV  Table Normal :V 44 la (k (No List 8Z@8  Plain TextOJQJ4 @4 0Footer  !.)@. Page Number4"4 Header  !.1. o0 Footer Char>oA> dGPlain Text CharOJQJPK!pO[Content_Types].xmlj0Eжr(΢]yl#!MB;.n̨̽\A1&ҫ QWKvUbOX#&1`RT9<l#$>r `С-;c=1g~'}xPiB$IO1Êk9IcLHY<;*v7'aE\h>=^,*8q;^*4?Wq{nԉogAߤ>8f2*<")QHxK |]Zz)ӁMSm@\&>!7;wP3[EBU`1OC5VD Xa?p S4[NS28;Y[꫙,T1|n;+/ʕj\\,E:! t4.T̡ e1 }; [z^pl@ok0e g@GGHPXNT,مde|*YdT\Y䀰+(T7$ow2缂#G֛ʥ?q NK-/M,WgxFV/FQⷶO&ecx\QLW@H!+{[|{!KAi `cm2iU|Y+ ި [[vxrNE3pmR =Y04,!&0+WC܃@oOS2'Sٮ05$ɤ]pm3Ft GɄ-!y"ӉV . `עv,O.%вKasSƭvMz`3{9+e@eՔLy7W_XtlPK! ѐ'theme/theme/_rels/themeManager.xml.relsM 0wooӺ&݈Э5 6?$Q ,.aic21h:qm@RN;d`o7gK(M&$R(.1r'JЊT8V"AȻHu}|$b{P8g/]QAsم(#L[PK-!pO[Content_Types].xmlPK-!֧6 -_rels/.relsPK-!kytheme/theme/themeManager.xmlPK-!!Z!theme/theme/theme1.xmlPK-! ѐ'( theme/theme/_rels/themeManager.xml.relsPK]# %^ 888;6 w \!]%&>()+XX "#%'(*+.B C"=&)XX!$&)-%99 24;!!8@0(  B S  ?@HTww{  w-6*kzhi{\\z  n o p {"}"~"##$p%q%s%t%v%w%y%z%|%}%%%%%%%S.Gp(7 PZ>Udo6 ^`OJQJo( 8^8`OJQJo(^`OJQJ^Jo(o  p^ `OJQJo(  @ ^ `OJQJo( x^x`OJQJo(H^H`OJQJ^Jo(o ^`OJQJo( ^`OJQJo(h ^`hH.h ^`hH.h $ $ ^$ `hH)h @ @ ^@ `hH.h ^`hH.h L^`LhH.h ^`hH.h ^`hH.h PLP^P`LhH.h ^`hH.h ^`hH.h pLp^p`LhH.h @ @ ^@ `hH.h ^`hH.h L^`LhH.h ^`hH.h ^`hH.h PLP^P`LhH.^`o()doGp( PZ         JI1D0qt'}( uZ z!V%y&k*DJ-a-2&m234S4O55"7g8T3EtFdG[JqKN P-pXY.OY^[H^| _8a;doeSg8AlMlQnoU&xt/|a2}%FQVg&c[-Q}t~ ;b ydos'I9-3#0q%s%@n%n%e~n%n%4$%@4@@UnknownGTimes New Roman5Symbol3 Arial? Courier New;WingdingsACambria Math"1h'4S' CC4^%^% 3QHP?-2!xx ,USE OF WEIGHTS FOR SURVEY DATATomTom     Oh+'0p  , 8 DPX`h' USE OF WEIGHTS FOR SURVEY DATATom Normal.dotmTom11Microsoft Macintosh Word@@*#@v ՜.+,0 hp|  ' C^% USE OF WEIGHTS FOR SURVEY DATA Title  !"#$%&'()*+,-./123456789:;<=>?@ABCDFGHIJKLNOPQRSTWRoot Entry F;Y1Table0(WordDocument*^SummaryInformation(EDocumentSummaryInformation8MCompObj` F Microsoft Word 97-2004 DocumentNB6WWord.Document.8