Exercise 9 Use the data in RENTAL.dta for this exercise. The data on rental prices and other variables for college towns are for the years 1980 and 1990. The idea is to see whether a stronger presence of students affects rental rates. The unobserved effects model is where pop is city population, avginc is average income, and pctstu is student population as a percentage of city population (during the school year). (i) Estimate the equation by pooled OLS and report the results in standard form. What do you make of the estimate on the 1990 dummy variable? What do you get for pctstu? ols lrent const y90 lpop lavginc pctstu (ii) Are the standard errors you report in part (i) valid? Explain. The standard errors from part (i) are not valid, unless we think ai does not really appear in the equation. If ai is in the error term, the errors across the two time periods for each city are positively correlated, and this invalidates the usual OLS standard errors and t statistics. (iii) Now, difference the equation and estimate by OLS. Compare your estimate of with that from part (i). Does the relative size of the student population appear to affect rental prices? diff lrent diff lpop diff lavginc diff pctstu ols d_lrent const d_lpop d_lavginc d_pctstu (iv) Estimate the model by fixed effects to verify that you get identical estimates and standard errors to those in part (iii). panel lrent const y90 lpop lavginc pctstu --fixed-effects The coefficient on y90t is identical to the intercept from the first difference estimation, and the slope coefficients and standard errors are identical to first differencing. We do not report an R-squared because none is comparable to the R-squared obtained from first differencing. The constant term can be ignored because some packages display it. it is usually the average of the estimated intercepts for the cross-sectional units, and it is not especially informative. (2) Suppose that, for one semester, you can collect the following data on a random sample of college juniors and seniors for each class taken: a standardized final exam score, percentage of lectures attended, a dummy variable indicating whether the class is within the student’s major, cumulative grade point average prior to the start of the semester, and SAT score (i) Why would you classify this data set as a cluster sample? Roughly, how many observations would you expect for the typical student? For each student we have several measures of performance, typically three or four, the number of classes taken by a student that have final exams. When we specify an equation for each standardized final exam score, the errors in the different equations for the same student are certain to be correlated: students who have more (unobserved) ability tend to do better on all tests. (ii) Write a model that explains final exam performance in terms of attendance and the other characteristics. Use s to subscript student and c to subscript class. Which variables do not change within a student? An unobserved effects model is where as is the unobserved student effect. Because SAT score and cumulative GPA depend only on the student, and not on the particular class he/she is taking, these do not have a c subscript. The attendance rates do generally vary across class, as does the indicator for whether a class is in the student’s major. The term denotes different intercepts for different classes. Unlike with a panel data set, where time is the natural ordering of the data within each cross-sectional unit, and the aggregate time effects apply to all units, intercepts for the different classes may not be needed. If all students took the same set of classes then this is similar to a panel data set, and we would want to put in different class intercepts. But with students taking different courses, the class we label as “1” for student A need have nothing to do with class “1” for student B. Thus, the different class intercepts based on arbitrarily ordering the classes for each student probably are not needed. We can replace with , an intercept constant across classes. (iii) If you pool all of the data and use OLS, what are you assuming about unobserved student characteristics that affect performance and attendance rate? What roles do SAT score and prior GPA play in this regard? Maintaining the assumption that the idiosyncratic error, usc, is uncorrelated with all explanatory variables, we need the unobserved student heterogeneity, as, to be uncorrelated with atndrtesc. The inclusion of SAT score and cumulative GPA should help in this regard, as as, is the part of ability that is not captured by SATs and cumGPAs. In other words, controlling for SATs and cumGPAs could be enough to obtain the ceteris paribus effect of class attendance. (iv) If you think SAT score and prior GPA do not adequately capture student ability, how would you estimate the effect of attendance on final exam performance? If SATs and cumGPAs are not sufficient controls for student ability and motivation, as is correlated with atndrtesc, and this would cause pooled OLS to be biased and inconsistent. We could use fixed effects instead. Within each student we compute the demeaned data, where, for each student, the means are computed across classes. The variables SATs and cumGPAs drop out of the analysis.