Answers to Empirical Exercises for Chapter 5 - UMass

[Pages:3]Answers to Empirical Exercises for Chapter 5

This table contains the results from seven regressions that are referenced in these answers. The dependent variable in all of the regressions is ED

Dist Female Black Hispanic Bytest Incomehi Ownhome DadColl MomColl Cue80 Stwmfg Tuition

(1) ?0.073** (0.013)

(2) ?0.031** (0.012) 0.143** (0.050) 0.354** (0.067) 0.402** (0.074) 0.092** (0.003) 0.367** (0.062) 0.146** (0.065) 0.570** (0.077) 0.379** (0.084) 0.024** (0.009) ?0.050** (0.020)

Urban

Intercept

13.956**

8.862**

(0.038)

(0.241)

F-statistic (p-values on joint hypotheses) and measures of fit

(a) Tuition and Urban

SER

1.807

R 2

0.007

Significant at the *5% and **1% significance level.

1.538 0.281

(3) ?0.033** (0.013) 0.144** (0.050) 0.338** (0.069) 0.349** (0.077) 0.093** (0.003) 0.374** (0.062) 0.143** (0.065) 0.574** (0.076) 0.379** (0.084) 0.028** (0.010) ?0.043** (0.020) ?0.185 (0.099)

0.065 (0.063) 8.894** (0.241)

2.43 (0.089) 1.538 0.281

1. Run a regression of years of completed education (ED) on distance to the nearest college (Dist). What is the estimated slope?

See column (1). The estimated slope is ?0.073.

2. Run a regression of ED on Dist, but include some additional regressors to control for characteristics of the student, the student's family, and the local labor market. In particular include as additional regressors Female, Black, and Hispanic, Incomehi, Ownhome, DadColl, Cue80 and Stwmfg80. What is the estimated effect of Dist on ED? Construct a 95% confidence interval for the coefficient on Dist in the regression.

See column (2). The estimated slope is ?0.031. The 95% confidence interval is ?0.031 ? (1.96?0.012) = ?0.054 to ?0.008.

3. Is the estimated effect of Dist on ED in the regression in (2) substantively different from the regression in (1)? Based on this, does the regression in (1) seem to suffer from important omitted variable bias?

The results are quite different. The estimated slope has changed from ?0.073 to ?0.031. The difference is large compared to the standard error. Evidently the regression in (1) suffered from omitted variable bias.

4. a. Bob is a black male. His high school was 20 miles from the nearest college. His base year composite test score (Bytest) was 58. His family income in 1980 was $26,000, and his family owned a home. His mother had attended college, but his father had not. The unemployment rate in his county was 7.5%, and the state average manufacturing hourly wage was $9.75. Predict Bob's years of completed schooling using the regression in (2).

?0.031?2 + 0.143?0 + 0.354?1 + 0.402?0 + 0.092?58 + 0.367?1 + 0.146?1 + 0.570?0 + 0.379?1 + 0.024?7.5 ? 0.050?9.75 + 8.861 = 15.1 years

b. Jim has the same characteristics as Bob except that his high school was 40 miles from the nearest college. Predict Jim's years of completed schooling using the regression in (2).

Jim's predicted schooling is Bob's predicted schooling + ^1 (DistJim ? DistBob) = 15.1 ? 0.031?(4?2) = 15.04 years.

5. Compare the fit of the regression in (1) and (2) using the regression standard errors, R2 and R2 . Why are the R2 and R2 so similar in regression (2)?

The regression in (2) fits the data much better than the regression in (1). The R2 in (1) is less than 1%, but it is 27% in (2). The standard error of the regression falls from 1.80 in (1) to 1.54 in (2). The R2 and R2 in (2) are very similar because n is very large (n = 3796).

7. The value of the coefficient on DadColl is positive and statistically significant. What does this coefficient measure?

The coefficient is 0.57. This means that average years of education increase by 0.57 years if a student's father attended college relative to a student whose father did not attend college, holding constant the other variables in the regression.

6. Explain why Cue80 and Swmfg80 appear in the regression. Are the signs of their estimated coefficients (+ or ?) what you would have guessed. Interpret the magnitudes of these coefficients.

These variables are measures of employment prospects (the local unemployment rate) and the opportunity cost for attending college (the average wage in manufacturing). The coefficient on the unemployment rate should be positive (a higher unemployment rate means that finding employment is more difficult, so that college is relatively more attractive). The coefficient on the manufacturing wage should be negative (a higher wage means a higher reward for working, so that college is relatively less attractive). The signs on the coefficients are as expected. The values of the coefficients are small (a 1% increase in the unemployment rate increases the average years of education by .02 years, and a $1.00 increase in the manufacturing wage decreases average years of schooling by .05 years).

7. The dataset contains two other variables, Urban and tuition. Explain why these variables might be important omitted variables. Include these variables in the regression. Are their coefficients statistically significant when tested one at a time? Are their coefficients statistically significant when tested jointly?

Tuition is another cost of attending college. With other things equal, an increase in tuition should reduce the demand for college. Commuting time is different in cities than in suburbs or rural areas. The variable Urban might capture this effect. The variable Tuition is significant at the 10% but not at the 5% level (p-value = .06). The magnitude of the coefficient is reasonably large, and a $1000 increase in tuition reduces average years of education by .185 years. The coefficient on Urban is not significant (t-statistic = 1.03). The F-statistic testing that both coefficients are zero has a pvalue of .08, so that the variables are jointly significant at the 10% but not at the 5% significance level.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download