Multiple choice (max 5) True/False (max 5) Problem 1 (max 6) Problem 2 (max 14) Total (max 30) Econometrics, Spring 2022 Final Exam The time limit is 90 minutes and the exam is worth a total of 30 points. You are NOT allowed to use any books, lecture notes or electronic devices except electronic calculators. You cannot cooperate with your classmates. In case of any such cooperation being detected, both parties will receive zero points in the final. Any violation of academic honesty will be punished to the fullest extent possible. The necessary formulas and statistical tables are included below this setup. _______________________________________________ Multiple choice questions (5 points max) 1. The general approach to obtaining fully robust standard errors and test statistics in the context of panel data is known as _____. a. confounding b. differencing c. clustering d. attenuating 2. If a process is said to be integrated of order one, or I(1), _____. a. the first difference of the process is weakly dependent b. it is stationary at level c. averages of such processes already satisfy the standard limit theorems d. it does not have a unit root 3. Which of the following is used to test whether a time series follows a unit root process? a. Wald test b. White test c. Johansen test d. Augmented Dickey-Fuller test 4) Consider the following regression model: log(𝑦) = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥1 2 + 𝛽2 𝑥3 + 𝑢. This model will suffer from functional form misspecification if _____. a. 𝛽0 is omitted from the model b. 𝑢 is heteroskedastic c. 𝑥1 2 is omitted from the model d. 𝑥3 is a binary variable 5. The significance level of a test is: a. the probability of rejecting the null hypothesis when it is false. b. the probability of rejecting the null hypothesis when it is true. c. one minus the probability of rejecting the null hypothesis when it is false. d. one minus the probability of rejecting the null hypothesis when it is true. True/False questions (5 points max) 1. The two stage least squares estimator is less efficient than the ordinary least squares estimator when the explanatory variables are exogenous. (True) 2. One of the disadvantages of the logistic regression function is that predicted probabilities can be outside of (0,1) interval. (False) 3. A data set is called an unbalanced panel if it has missing years for at least some cross-sectional units in the sample. (True) 4. Weakly dependent processes are said to be integrated of order zero. (True) 5. The multiple linear regression model with a binary dependent variable is called the linear probability model. (True) Problem 1 (6 points max) a) What approach would you use to estimate the causal effect if you have dataset with two group of observational units and two-period panel data? The correct answer: difference-in-difference approach is used for estimation of causal effect in case of two-period panel data and two observation groups (treated and control). However, I also gave full points for fixed effects or first-difference model. b) What are the conditions that instrumental variable should satisfy to be valid? Instrumental variable (or instrument) should be a variable z such that 1. z is uncorrelated with the error term: Cov(z, ε) = 0 2. z is correlated with the explanatory variable x: Cov(x, z) ≠ 0 Problem 2 (14 points max) Suppose you want to assess the impact of a race of an individual on the likelihood of approving a mortgage loan. In the example below the key explanatory variable is white, a dummy variable equal to one if the applicant was white. The other applicants in the data set are black and Hispanic. To test for the discrimination in the mortgage loan market, a linear probability model (LPM) can be used: 𝑎𝑝𝑝𝑟𝑜𝑣𝑒 = 𝛼0 + 𝛼1 𝑤ℎ𝑖𝑡𝑒 + 𝑢 a) (2pt) Suppose you obtain the following output from the regression above. Interpret the coefficient on white. White people are 20% more likely to obtain approved mortgage than blacks and Hispanics. b) (3pt) Name at least one pro and one con of using an LPM. It is easy to interpret, predicted values goes out of [0,1] interval. c) (3pt) Suppose now that you run probit and logit models as well, interpret the coefficients on white for probit and logit models and compare them with the LPM model. In probit and logit models you cannot directly interpret the coefficients, what matters is only signs – if a person is white, he is more likely to get mortgage approved. d) (3pt) By how much is it more likely for white people to obtain mortgage loan in comparison to minorities according to probit model? How different is this result from LPM result? 𝑷(𝒀 = 𝟏|𝑿) = 𝝓(𝟎. 𝟓𝟒𝟕 + 𝟎. 𝟕𝟖𝟒 ∗ 𝟏) = 𝝓(𝟏. 𝟑𝟑𝟏) = 𝟎. 𝟗𝟎𝟖 𝑷(𝒀 = 𝟎 | 𝑿) = 𝝓(𝟎. 𝟓𝟒𝟕) = 𝟎. 𝟕𝟎𝟓 Comparison: 0.908-0.705=0.203 -> by 20% more likely just like in the LPM. e) (3pt) By how much is it more likely for white people to obtain mortgage loan in comparison to minorities according to logit model? Note that the functional form of the logit model is Λ(∙) = exp(∙) 1+exp (∙) . 𝚲(𝟐. 𝟐𝟗𝟒) = 𝐞𝐱𝐩(𝟐.𝟐𝟗𝟒) 𝟏+𝐞𝐱𝐩 (𝟐.𝟐𝟗𝟒) = 𝟎. 𝟗𝟎𝟖. 𝚲(𝟎. 𝟖𝟖𝟓) = 𝐞𝐱𝐩(𝟎.𝟖𝟖𝟓) 𝟏+𝐞𝐱𝐩 (𝟎.𝟖𝟖𝟓) = 𝟎. 𝟕𝟎𝟖. Difference is about 20% meaning that white people are 20% more likely to get mortgage approved.