ࡱ> ~}!` Gbjbj\\ 8n>>?%       XXX8Y<Z,`0Z$TZYYY[d[d`YYYdZZZZ```$#8D ```8Tr6       Checking for Multicollinearity Using SAS (commands=day3_finan_collin.sas) The examples in this handout revisit the multiple regression analysis performed using the CARS data set on Day 2. We begin working with the original CARS data set in the SASDATA2 library, exclude the strange observation with YEAR = 0, and assess the pairwise correlations of several additional predictor variables: libname sasdata2 "C:\temp\sasdata2"; data cars2; set sasdata2.cars; if year ne 0; run; /* assess pairwise correlations of continuous variables */ proc corr data = cars2; var weight year mpg engine horse accel cylinder; run; The CORR Procedure 7 Variables: WEIGHT YEAR MPG ENGINE HORSE ACCEL CYLINDER Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum WEIGHT 405 2975 843.54637 1204910 1613 5140 YEAR 405 75.93580 3.74177 30754 70.00000 82.00000 MPG 397 23.55113 7.79170 9350 10.00000 46.60000 ENGINE 405 194.50988 104.91125 78777 68.00000 455.00000 HORSE 399 104.86216 38.56586 41840 46.00000 230.00000 ACCEL 405 15.51235 2.80290 6283 8.00000 24.80000 CYLINDER 405 5.46914 1.70966 2215 3.00000 8.00000 Simple Statistics Variable Label WEIGHT Vehicle Weight (lbs.) YEAR Model Year (modulo 100) MPG Miles per Gallon ENGINE Engine Displacement (cu. inches) HORSE Horsepower ACCEL Time to Accelerate from 0 to 60 mph (sec) CYLINDER Number of Cylinders Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations WEIGHT YEAR MPG ENGINE WEIGHT 1.00000 -0.30990 -0.83014 0.93335 Vehicle Weight (lbs.) <.0001 <.0001 <.0001 405 405 397 405 YEAR -0.30990 1.00000 0.57608 -0.37932 Model Year (modulo 100) <.0001 <.0001 <.0001 405 405 397 405 MPG -0.83014 0.57608 1.00000 -0.80388 Miles per Gallon <.0001 <.0001 <.0001 397 397 397 397 ENGINE 0.93335 -0.37932 -0.80388 1.00000 Engine Displacement (cu. inches) <.0001 <.0001 <.0001 405 405 397 405 HORSE 0.86506 -0.41946 -0.77619 0.89949 Horsepower <.0001 <.0001 <.0001 <.0001 399 399 391 399 ACCEL -0.43847 0.30753 0.42775 -0.56226 Time to Accelerate from 0 to 60 mph (sec) <.0001 <.0001 <.0001 <.0001 405 405 397 405 CYLINDER 0.89482 -0.35706 -0.77394 0.95183 Number of Cylinders <.0001 <.0001 <.0001 <.0001 405 405 397 405 Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations HORSE ACCEL CYLINDER WEIGHT 0.86506 -0.43847 0.89482 Vehicle Weight (lbs.) <.0001 <.0001 <.0001 399 405 405 YEAR -0.41946 0.30753 -0.35706 Model Year (modulo 100) <.0001 <.0001 <.0001 399 405 405 MPG -0.77619 0.42775 -0.77394 Miles per Gallon <.0001 <.0001 <.0001 391 397 397 ENGINE 0.89949 -0.56226 0.95183 Engine Displacement (cu. inches) <.0001 <.0001 <.0001 399 405 405 HORSE 1.00000 -0.70885 0.84355 Horsepower <.0001 <.0001 399 399 399 ACCEL -0.70885 1.00000 -0.52849 Time to Accelerate from 0 to 60 mph (sec) <.0001 <.0001 399 405 405 CYLINDER 0.84355 -0.52849 1.00000 Number of Cylinders <.0001 <.0001 399 405 405 We note several extremely large positive and negative correlations, especially of the ENGINE variable with other possible predictors of MPG. Including several predictors in a multiple regression model that are highly correlated with each other (multicollinearity) can have an adverse effect on estimates of regression coefficients (signs may be flipped entirely) and their significance tests (standard errors tend to be inflated and are much larger than they should be, relative to the case where predictors are not correlated). We consider fitting a regression model using all of these predictors of MPG: /* fit a regression model using all possible predictors */ proc reg data = cars2; model mpg = weight year engine horse accel cylinder; run;quit; The REG Procedure Model: MODEL1 Dependent Variable: MPG Miles per Gallon Number of Observations Read 405 Number of Observations Used 391 Number of Observations with Missing Values 14 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 6 19070 3178.30004 268.83 <.0001 Error 384 4539.97495 11.82285 Corrected Total 390 23610 Root MSE 3.43844 R-Square 0.8077 Dependent Mean 23.48286 Adj R-Sq 0.8047 Coeff Var 14.64233 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 -14.31877 4.78551 -2.99 0.0029 WEIGHT Vehicle Weight (lbs.) 1 -0.00680 0.00067085 -10.14 <.0001 YEAR Model Year (modulo 100) 1 0.75456 0.05271 14.31 <.0001 ENGINE Engine Displacement (cu. 1 0.00834 0.00747 1.12 0.2647 inches) HORSE Horsepower 1 -0.00200 0.01417 -0.14 0.8877 ACCEL Time to Accelerate from 0 to 1 0.07502 0.10392 0.72 0.4708 60 mph (sec) CYLINDER Number of Cylinders 1 -0.34586 0.33376 -1.04 0.3007 Results to note are highlighted in boldface. Despite the fact that ENGINE, HORSE, ACCEL, and CYLINDER all have significant bivariate correlations with MPG, none of them appear to be significant predictors of MPG based on the t-tests for their coefficients in the regression model. Whats happening here? We can assess collinearity diagnostics using additional options supplied to Proc Reg (vif, tol, and collin): /* refit model, and request collinearity diagnostics */ proc reg data = cars2; model mpg = weight year engine horse accel cylinder / vif tol collin; run; quit; Parameter Estimates Variance Variable Label DF Tolerance Inflation Intercept Intercept 1 . 0 WEIGHT Vehicle Weight (lbs.) 1 0.09416 10.62064 YEAR Model Year (modulo 100) 1 0.80734 1.23864 ENGINE Engine Displacement (cu. 1 0.04964 20.14676 inches) HORSE Horsepower 1 0.10298 9.71021 ACCEL Time to Accelerate from 0 to 1 0.36895 2.71042 60 mph (sec) CYLINDER Number of Cylinders 1 0.09382 10.65908 Collinearity Diagnostics Condition ---------------Proportion of Variation--------------- Number Eigenvalue Index Intercept WEIGHT YEAR ENGINE 1 6.66806 1.00000 0.00002873 0.00015625 0.00004058 0.00022630 2 0.28737 4.81704 0.00047589 0.00053364 0.00083990 0.01247 3 0.02426 16.57992 0.00295 0.00647 0.00357 0.06229 4 0.00993 25.90982 0.00750 0.20868 0.01533 0.00060254 5 0.00500 36.53406 0.00298 0.53551 0.00061181 0.76121 6 0.00459 38.12909 0.01096 0.19390 0.09026 0.16068 7 0.00080065 91.25974 0.97510 0.05475 0.88936 0.00252 Collinearity Diagnostics ---------Proportion of Variation--------- Number HORSE ACCEL CYLINDER 1 0.00026356 0.00023012 0.00018279 2 0.00388 0.01143 0.00119 3 0.25738 0.07604 0.02274 4 0.07830 0.15214 0.30804 5 0.02938 0.14182 0.31621 6 0.43904 0.49405 0.33015 7 0.19175 0.12428 0.02147 The tolerance (requested by the tol option) is the proportion of variance in a given predictor that is NOT explained by all of the other predictors, while the VIF (or Variance Inflation Factor) is simply 1 / tolerance. The VIF represents a factor by which the variance of the estimated coefficient is multiplied due to the multicollinearity in the model. In other words, the variance of the estimated coefficient for ENGINE is 20 times larger than it would be if the predictors were orthogonal (i.e., not correlated). A good global check for a multicollinearity problem is to see if the largest condition index is greater than 30. This is a general rule of thumb, and we definitely see a problem in this model. Predictors with large proportions of variation for eigenvalues that are very small (the last three eigenvalues in this case) tend to be highly correlated, and could possibly be combined rather than considered separately. At this point, one could arbitrarily choose one of a set of highly correlated predictors to retain in the model, or possibly combine the correlated predictors into a single variable (perhaps using a technique known as principal components analysis). We arbitrarily only include ENGINE as an additional predictor, and refit the model: /* refit model, including ENGINE only */ proc reg data = cars2; model mpg = weight year engine /*horse accel cylinder*/ / vif tol collin; run; quit; Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 -14.37464 3.99146 -3.60 0.0004 WEIGHT Vehicle Weight (lbs.) 1 -0.00684 0.00057750 -11.84 <.0001 YEAR Model Year (modulo 100) 1 0.76191 0.05082 14.99 <.0001 ENGINE Engine Displacement (cu. 1 0.00148 0.00479 0.31 0.7573 inches) Parameter Estimates Variance Variable Label DF Tolerance Inflation Intercept Intercept 1 . 0 WEIGHT Vehicle Weight (lbs.) 1 0.12614 7.92792 YEAR Model Year (modulo 100) 1 0.85080 1.17536 ENGINE Engine Displacement (cu. 1 0.11994 8.33762 We still see a problem with variance inflation due to the correlation of ENGINE with WEIGHT (to the point where WEIGHT does not seem to have a significant relationship with MPG), suggesting that our original model from yesterday would be sufficient. The largest condition index for this model was about 62. Be very wary of multicollinearity and its side effects when fitting multiple regression models using several continuous predictors! Centering variables (see Day 2 handout) can help, especially when working with interactions between continuous predictors. Fitting Regression Models Using SAS INSIGHT Type insight into the command line dialog box in the SAS window to start SAS INSIGHT. Select a defined library and a SAS data set to work with. Select Analyze, and then Fit (Y X) to fit a regression model. Select the dependent (Y) variable, and any independent (X) predictor variables that are defined in the SAS data set. Click OK to fit the model and generate estimates of the regression coefficients. To investigate additional diagnostic plots after the model has been fitted, use the Graphs menu. In the case of simple linear regression, confidence curves can be drawn around the fitted line by using the Curves menu. Additional tables of output (e.g., Collinearity Diagnostics) can be requested using the Tables menu.     PAGE  PAGE 5 ()JK εdK5KdKdK5+hxCJOJQJ^JaJfHq 1hxhxCJOJQJ^JaJfHq 7hxhx5CJOJQJ\^JaJfHq 1h(y5CJOJQJ\^JaJfHq 7h(yh(y5CJOJQJ\^JaJfHq 1h(yh(yCJOJQJ^JaJfHq hxhx5CJ(aJ(h*hx5CJ$aJ$h*hj5CJ0aJ0h*hx5CJ0aJ0)JK ! " : m r s t u v w B 7$8$H$gd 7$8$H$gdxgdx$a$gdxGG ! " & ' + m p q r u v  lͱleTFTFTFTFTFTh PCJOJQJ^JaJ h PhCJOJQJ^JaJ hxhr+hrCJOJQJ^JaJfHq +hxCJOJQJ^JaJfHq 1hxhxCJOJQJ^JaJfHq 7hxhx5CJOJQJ\^JaJfHq 1hxhyjCJOJQJ^JaJfHq 1hyjhyjCJOJQJ^JaJfHq B C Z  q +,e.p_78 7$8$H$gdQ!R!T!,"I"Z""""*##########?$B$C$G$H$I$J$þúúǺzazazazaH1hOh0fJCJOJQJ^JaJfHq 1h0fJh0fJCJOJQJ^JaJfHq 7h0fJh0fJ5CJOJQJ\^JaJfHq hyj hyjhyj1hyjhyjCJOJQJ^JaJfHq h0fJ h)g5h)ghxh Ph h PhCJOJQJ^JaJh PCJOJQJ^JaJhCJOJQJ^JaJUVst2Ca >< 7$8$H$gd<x  fr$%~01<=HIT U 7$8$H$gdU S!T!####$?$I$J$$$$$I%%%%%&&^&& 7$8$H$gdO 7$8$H$gd0fJgdxgd 7$8$H$gdJ$%%''M(F*+++,,,-)-B-g-o-q-t-v-y----------ϽϹhO1h*hSDCJOJQJ^JaJfHq 4h*hSDCJOJQJ\^JaJfHq 1hSDhSDCJOJQJ^JaJfHq h)gh*h*5h!h* h0fJ5h0fJhyj#h0fJh0fJ5CJOJQJ^JaJ h0fJh0fJCJOJQJ^JaJ#h0fJh0fJ5CJOJQJ^JaJh0fJCJOJQJ^JaJ&& 'O'''''(K(L(M((((.)/)))F***+r+++++-gdx 7$8$H$gd0fJ-----!.&.,.-.f.g..//g//0f000.1Q111111 7$8$H$gd!gdSD 7$8$H$gdSDgdx--..!.$.&.*.+.,.-.g//0f00011P1Q11114556507 8̱̪ugYG#h(yh!5CJOJQJ^JaJh!CJOJQJ^JaJhOCJOJQJ^JaJh!CJOJQJ^JaJ#h!h!5CJOJQJ^JaJ h!h!CJOJQJ^JaJhSD h*hSD4h*hSD5CJOJQJ^JaJfHq 1h*hSDCJOJQJ^JaJfHq 4h*hSDCJOJQJ\^JaJfHq 1D2223_334y445565r5s5566R66607z7788::;gdxgd! 7$8$H$gd! 888888/828;8889:f:u:=1=5=6=9=======? @ @ @AEBFB~l~^~lRhSDhSD5CJaJhOCJOJQJ^JaJ#hSDhSD5CJOJQJ^JaJ hSDhSDCJOJQJ^JaJ+hSDCJOJQJ^JaJfHq 7hSDhSD5CJOJQJ\^JaJfHq 1hSDhSDCJOJQJ^JaJfHq hSD5h*h*h*5hSDh0fJh(yh!5CJaJ ;;==0=1=H=======#>>>>;??? @ @F@G@@@@GAgdSD 7$8$H$gdSDgdxGAAAFBGBHB{C|C{D|D}DDDEE=E>E|E}ECFDFGGGGGh^hgd+.W & Fgd+.WgdxgdSD 7$8$H$gdSDFBGBHBACzC{CCC{D|D}DDEEEEELEWEaE{E}EEEBFDFFFGGGGwG~GGGGGGGGGGGGGGGGGGGGGGGGGƼƼƧhhr0JmHnHu hO0JjhO0JUhOjhOUhEh+.Wh+.W5 h+.W5 hE5h+.Wh]chahSDhh0fJ9GGGGGGGGGGGGGGG & Fgd+.W &`#$gd6&P1h:p*/ =!"#$% @@@ NormalCJ_HaJmH sH tH DA@D Default Paragraph FontRiR  Table Normal4 l4a (k@(No List4 @4 *Footer  !.)@. * Page Number?n!z!z z z zI) 8?@<:)JK!":mrstuvw  BCZq+,e.p_7 8 U V  s t 2 C a ><x  fr$%~01<=HITUST?IJI^ O K L M .!/!!!F"""#r#####%%%%%!&&&,&-&f&g&&''g''(f(((.)Q))))))D***+_++,y,,5-6-r-s--..R...0/z//002233550515H5555555#6666;777 8 8F8G8888G999F:G:H:{;|;{<|<}<<<====>=|=}=C>D>?????!Y!!t!!!1!1!1!1!1!1!1!1!1!1!1!1!1!1!1!!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!!!!1!!1!1!1!!!!!!!!!!!!!!!!!!!!!!!!!!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!!!!1!1!1!1!1!1!!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!!!!!!!!1!1!1!1!1!1!1!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!D[!!!!!!!!!!!!!!!!!!!!y!)JK!":mrstuvw  BCZq+,e.p_7 8 U V  s t 2 C a ><x  fr$%~01<=HITUST?IJI^ O K L M .!/!!!F"""#r#####%%%%%!&&&,&-&f&g&&''g''(f(((.)Q))))))D***+_++,y,,5-6-r-s--..R...0/z//002233550515H5555555#6666;777 8 8F8G8888G999F:G:H:{;|;{<|<}<<<====>=|=}=C>D>??????????????????000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 0@0@0@0I00I00I00I00I00@0I00I0000I0 0)JK!":mvw  BCZq+,e.p_7 8 U V  s t 2 C a ><x  fr$%~01<=HITU?IJI^ O K L M .!/!!!F"""#r#####%%%%%!&&&,&-&f&g&&''g''(f(((.)Q))))))D***+_++,y,,5-6-r-s--..R...0/z//002233550515H5555555#6666;777 8 8F8G8888G999F:G:H:{;|;{<|<}<<<====>=|=}=C>D>???I00{I00{I00{I00{I00{I00{I00{I00{I00{I00{I00{I00{I00{I00{I00{I00zK00yK00K00K00I00I00I00K00I00I00I00I00I00I00I00I00I00K00I00I00I00I00I00I00I00I00I00K00I00K00I00I00I00I00I00I00I00I00I00I00I00I00K00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00K00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00K00I00I00I00I00I00I00?I00I00I00I00I00K00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00I00K00I00I00I00I00I00I00I00I00I00I00I00I000000000000000000000000000000000000000000000000000000000 00 00 00 00 00 $$$' J$- 8FBG$')-025B <U &-1;GAGG%(*+,./1346G&  '!!=(>("A(B("rr?uu?9*urn:schemas-microsoft-com:office:smarttagsplace8*urn:schemas-microsoft-com:office:smarttagsCity ƼEM'+<?]bOU7 : U X s v ]a #,/8;69DGPSNQ/4?G  - 2 3 6 4d?d?3QHX)?# 2(Checking for Multicollinearity Using SAS Welchome Oh+'0 $0 P \ ht|,Checking for Multicollinearity Using SAS  Normal.dot Welchome4Microsoft Office Word@^в@X@p z  6՜.+,0 hp|   s d?d )Checking for Multicollinearity Using SAS Title  !"#$%&'()*+,-./012345679:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklnopqrstvwxyz{|Root Entry F η1Table8iWordDocument8nSummaryInformation(mDocumentSummaryInformation8uCompObjq  FMicrosoft Office Word Document MSWordDocWord.Document.89q