Introductory Econometrics Home Assignment 2 Suggested Solution by Hieu Nguyen Fall 2024 Solution of the assignment is to be delivered electronically to 254279@muni.cz by December 11, 2024 23:59:59 the latest. Late submissions will not be accepted, resulting in zero points. Form teams of two people, please. Only one team member is supposed to submit the solution with both team members’ names and email addresses on the first page of the document. Teams are required to work independently, and any form of plagiarism will be treated accordingly. Please understand that the main advantage of teamwork is the synergy from solving the problems together and the possibility to share and discuss your econometric knowledge with your teammate. It is not about a pure division of tasks. So, please, do cooperate and make sure you both understand all solutions completely. The text itself can be written in any software of your choice (MS Word, LaTeX, Pages etc.), but the .pdf format [5 MB max, .xls(x) can be attached in .zip] of the final document is required. Please, name the file ECONOMETRICS Surname1 Surname2 HA02.pdf. In your report, please, be clear and reasonably concise, but do explain all essential steps (e.g., important matrices) of your solution/reasoning. Keep in mind that not only the correctness of your answers and interpretations is assessed, but also the text-editing quality is an integral part of your output. Fingers crossed! Hieu Nguyen 1 Problem 1: Wage equation for young males (4 points: 4 · up to 1 point based on the quality and completeness of the analysis) Dataset wage4c.gdt was used in the 1990s to study the interindustry wage differentials estimated for individuals. It contains 935 observations of monthly wages, study and occupational experience, and family and personal characteristics of young males. Find a specific description of the variables in the dataset. 1. Check the dataset, report the main summary descriptive statistics of the original variables used in the model equation below, and briefly discuss whether everything seems all right. Then construct the dependent variable. State and explain your working hypotheses about signs of the slope coefficients first, then estimate this model in Gretl and report the results: ln(wage) = β0 + β1educ + β2exper + β3tenure + β4married + ϵ. Finally, interpret the estimated coefficients ˆβ1 and ˆβ4. 2. State the null hypothesis that another year of general workforce experience has the same impact on wage as another year of tenure with the current employer. Test this hypothesis at the 10% significance level. What do you conclude? You can do the test manually or via Gretl. 3. Test the joint hypothesis that β2 = β4 = 0 at the 5% significance level. First, do the test manually. Then, check your results by conducting the same but automated test in Gretl. Note: If you only show the test in Gretl, you won’t get any point of this question. 4. As economists, we might be interested in a potential issue of racial discrimination in the labor market. Enrich the model with three additional dummy variables, estimate it again, and report the results in the usual form: ln(wage) = β0 + β1educ + β2exper + β3tenure + β4married + β5darkskin + β6south + β7urban + ν. Holding other factors fixed, what is the estimated difference in monthly wage between dark skin and non-dark skin individuals? Is it statistically significant? Show clearly your test either manually or via Grel. Solution: 1. The main summary descriptive statistics: Variable Mean Median S.D. Min Max wage 958 905 404 115 3080 educ 13.5 12.0 2.20 9.00 18.0 exper 11.6 11.0 4.37 1.00 23.0 tenure 7.23 7.00 5.08 0.00 22.0 married 0.893 1.00 0.309 0.00 1.00 Everything seems fine, according to a brief inspection of the data and the statistics above. The dataset is large, and there are no missing observations. All observations are non-negative. The sample means, maxima, and minima seem realistic both for 1990s monthly wages in USD and for all variables measured in years. It is also interesting to check the negative correlation between educ and exper, which is relatively high, almost -46%. We also observe that nearly 90% of the sampled individual are married. Model 1: OLS, using observations 1{935 Dependent variable: l wage Coefficient Std. Error t-ratio p-value 2 const 5.33065 0.114378 46.61 0.0000 educ 0.0753568 0.00643491 11.71 0.0000 exper 0.0141191 0.00333826 4.229 0.0000 tenure 0.0127554 0.00255923 4.984 0.0000 married 0.199171 0.0408196 4.879 0.0000 Mean dependent var 6.779004 S.D. dependent var 0.421144 Sum squared resid 136.4675 S.E. of regression 0.383066 R2 0.176201 Adjusted R2 0.172658 F (4, 930) 49.72908 P-value(F ) 5.96e{38 Log-likelihood 427.0223 Akaike criterion 864.0447 Schwarz criterion 888.2474 Hannan{Quinn 873.2734 • all coefficients appear strongly statistically significant at all reasonable levels; • ˆβ1 ≈ 0.075 . . . 1 unit (1 year) change of years of education is associated with circa 7.5% change in the same direction of monthly earnings (USD), ceteris paribus (a log-linear functional form); • ˆβ4 ≈ 0.20 . . . married men earn circa 20% higher salary compared to non-married, ceteris paribus (a log-linear functional form with a dummy). 2. This is an F-test of joint hypotheses/linear restrictions that relates two coefficients. The unrestricted model is Model 1. We test: H0 : β2 = β3 vs HA : β2 ̸= β3 ⇒ F = (RSSR−RSSU )/J RSSU /(n−k−1) ∼ FJ,n−k−1. The restricted model is: ln(wage) = β0 + β1educ + β2(exper + tenure) + β4married + u. After transforming the data to a new variable exper plus tenure = exper + tenure, we run the regressions of the restricted model and conduct a standard F-test with J = 1 and k = 4: Model 3: OLS, using observations 1{935 Dependent variable: l wage Coefficient Std. Error t-ratio p-value const 5.34544 0.102405 52.20 0.0000 educ 0.0746565 0.00596462 12.52 0.0000 exper plus tenure 0.0132947 0.00176337 7.539 0.0000 married 0.199477 0.0407860 4.891 0.0000 Mean dependent var 6.779004 S.D. dependent var 0.421144 Sum squared resid 136.4799 S.E. of regression 0.382877 R2 0.176126 Adjusted R2 0.173471 F (3, 931) 66.34246 P-value(F ) 7.00e{39 Log-likelihood 427.0649 Akaike criterion 862.1297 Schwarz criterion 881.4919 Hannan{Quinn 869.5127 We compute from the regression outputs the F-statistic: F ≈ (136.48 − 136.4675)/1 136.4675/930 ≈ 0.085. The critical value for an F-test is F1,930,0.90 ≈ 2.71. Since F ≈ 0.13 < F1,930,0.90 ≈ 2.71, we do not reject the H0 at the given significance level. We can thus conclude that β2 and β3 are statistically similar (at the 90% confidence level). 3 3. Here we test the joint significance of two coefficients because we test for this (incomplete) set of two joint hypotheses using an F-test: The unrestricted model is Model 1. We test: H0 : β2 = 0 β4 = 0 vs HA : β2 ̸= 0 β4 ̸= 0 ⇒ F = (RSSR − RSSU )/J RSSU /(n − k − 1) ∼ FJ,n−k−1. The restricted model is: ln(wage) = β0 + β1educ + β3tenure + v. Estimated output from Gretl: Model 3: OLS, using observations 1{935 Dependent variable: l wage Coefficient Std. Error t-ratio p-value const 5.83613 0.0823983 70.83 0.0000 educ 0.0612079 0.00584008 10.48 0.0000 tenure 0.0163803 0.00252771 6.480 0.0000 Mean dependent var 6.779004 S.D. dependent var 0.421144 Sum squared resid 143.0720 S.E. of regression 0.391804 R2 0.136332 Adjusted R2 0.134479 F (2, 932) 73.55931 P-value(F ) 2.18e{30 Log-likelihood 449.1172 Akaike criterion 904.2344 Schwarz criterion 918.7561 Hannan{Quinn 909.7716 We compute from the regression outputs the F-statistic: F ≈ (143.07 − 136.4675)/2 136.4675/930 ≈ 22.5. The critical value for an F-test is F2,930,0.95 = 3.01. Since F ≈ 22.5 is much greater than 3.01, we do reject the H0 of joint insignificance of β2 and β4 at the given significance level. We can thus conclude that at least one of the two coefficients is statistically significantly different from 0 (at the 95% confidence level). The Gretl output of the test of linear restrictions suggests the same conclusion: Restriction set 1: b[exper] = 0 2: b[married] = 0 Test statistic: F(2, 930) = 22.5043, with p-value = 2.85515e-10 Restricted estimates: coefficient std. error t-ratio p-value --------------------------------------------------------- 4 const 5.83613 0.0823983 70.83 0.0000 *** educ 0.0612079 0.00584008 10.48 2.26e-24 *** exper 0.00000 0.00000 NA NA tenure 0.0163803 0.00252771 6.480 1.48e-10 *** married 0.00000 0.00000 NA NA Standard error of the regression = 0.391804 4. The Gretl output: Model 4: OLS, using observations 1{935 Dependent variable: l wage Coefficient Std. Error t-ratio p-value const 5.39550 0.113225 47.65 0.0000 educ 0.0654307 0.00625040 10.47 0.0000 exper 0.0140430 0.00318519 4.409 0.0000 tenure 0.0117473 0.00245297 4.789 0.0000 married 0.199417 0.0390502 5.107 0.0000 darkskin 0.188350 0.0376666 5.000 0.0000 south 0.0909037 0.0262485 3.463 0.0006 urban 0.183912 0.0269583 6.822 0.0000 Mean dependent var 6.779004 S.D. dependent var 0.421144 Sum squared resid 123.8185 S.E. of regression 0.365471 R2 0.252558 Adjusted R2 0.246914 F (7, 927) 44.74706 P-value(F ) 1.16e{54 Log-likelihood 381.5490 Akaike criterion 779.0979 Schwarz criterion 817.8223 Hannan{Quinn 793.8638 We observe a statistically significant negative impact of darkskin. Having dark skin significantly decreasing the monthly salary of young men on average by almost 19% compared to non-dark skin individuals, ceteris paribus, can be interpreted as empirically aggregated evidence for racial discrimination. The conclusion is based on a standard two-sided t-test or simple inspection of the p-value from the Gretl output above. Problem 2: Wage Equation and Return to Education in 70s (6 points) One of the most recent Nobel laureates in Economic Sciences from 2021, David Card, used in his working paper from 1993 wage and education data for a sample of men in the U.S. in 1976 to estimate the return to education. Dataset card.csv contains 3,010 observations of hourly wages, schooling and occupational experience, family and personal characteristics, and potential proxies for unobserved personal qualities. Please find a specific description of the variables and the units of measurement in the attached .txt file. Be aware that there are missing values in the dataset. The task of this creative empirical exercise is to develop your own explanatory/predictive model for the determination of individual wage (‘wage equation’). The goal is not to develop the best 5 possible model based on the given dataset but to successfully create a relatively simple but useful and intuitive empirical model that includes the essential variables while following general suggestions in parts 1. to 6. You should also carefully report the progress of your analysis step by step. If you would like to extend your analysis even further (either in an individual part or in general, e.g., with the multicollinearity analysis or by considering more variables), you are more than welcome to do so. 1. (1.5 pts) Suggest a few (two or three) intuitively the most important explanatory variables for the determination of wage and report and briefly describe their main summary statistics (also include the dependent wage). Discuss suitable functional forms of the variables and estimate the resulting model with OLS. Comment on and interpret the important findings from the result of OLS. 2. (1.5 pt) Suggest two additional potentially important explanatory variables and, taking advantage of the four important variable selection criteria, analyze if they belong to the model. Also, compare your new model to two alternative models with different functional forms between variables. 3. (2 pts) Add from the dataset two or three potentially important intercept dummies and explain why your selection makes sense. Re-estimate the extended model, interpret the newly estimated coefficients, and decide whether the new dummies should remain in the model. Next, add a slope dummy interacting with one of the included quantitative variables, explain your motivation, and interpret the newly added estimated coefficient. Finally, add another interaction term (between two quantitative variables or two dummies), re-estimate, and interpret. What is your resulting model after this step of the analysis? Interpret the overall significance of the regression. 4. (1 pt) Apply the White test for heteroskedasticity. Should we re-calculate the model to obtain heteroskedasticity-robust standard errors? If yes, please do so and interpret your results. Attached: 1. Dataset: card.csv, wage4c.gdt 2. Description of variables: card description2.txt Solution: Students are free to choose regressors and hence, following answers are suggested, not the fixed results. Grades to students will be delivered case by case. 1. We choose the two most important regressors: educ and exper, and we use the quadratic form for exper as we experienced before that we might expect concave returns to cumulative experience. Also note that we use the log(wage) as the dependent variable as we can expect an increasing rate of change (wage usually grows in percentage terms while exper in years, i.e., unit terms). As we expected, all of the coefficients are statistically significant, but the adjusted R2 is rather low, while the return to experience is really concave. log(wage) ~ educ + exper + exper^2 Estimate Std. Error t value Pr(>|t|) (Intercept) 4.4685404 0.0686899 65.054 < 2e-16 *** educ 0.0931707 0.0035802 26.024 < 2e-16 *** exper 0.0897828 0.0070636 12.711 < 2e-16 *** I(exper^2) -0.0024859 0.0003377 -7.361 2.35e-13 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.3982 on 3006 degrees of freedom Multiple R-squared: 0.1958, Adjusted R-squared: 0.195 F-statistic: 244 on 3 and 3006 DF, p-value: < 2.2e-16 2. Adding two additional variables: IQ and Nearc4. A higher IQ should help us get better jobs, and we could have leveraged our work-related skills and knowledge by having a college nearby. 6 log(wage) ~ educ + exper + exper^2 + IQ + nearc4 Estimate Std. Error t value Pr(>|t|) (Intercept) 3.9840038 0.0789789 50.444 < 2e-16 *** educ 0.0757135 0.0037452 20.216 < 2e-16 *** exper 0.0967456 0.0069212 13.978 < 2e-16 *** I(exper^2) -0.0028335 0.0003307 -8.569 < 2e-16 *** IQ 0.0062390 0.0005607 11.128 < 2e-16 *** nearc4 0.0888423 0.0153913 5.772 8.62e-09 *** Residual standard error: 0.3877 on 3004 degrees of freedom Multiple R-squared: 0.2383, Adjusted R-squared: 0.237 F-statistic: 187.9 on 5 and 3004 DF, p-value: < 2.2e-16 All variables are strongly significant. 0.6 pct wage return to IQ looks realistic; same for the nearc4 dummy (9 pct). The coefficients on educ went down from 0.093 to 0.076. For experience the change is not that high. The adjusted R2 increases from 0.195 to 0.237, which is a considerable relative change but still includes new explanatory power. Considering the four important selection criteria, the theory/intuition suggests both variables should be included, as discussed above. We also most likely observe an omitted variable bias reduction as both variables are positively correlated with educ, which coefficient went down. The adjusted R2 increases, and both new variables are strongly statistically significant. Both variables thus most likely belong to the model. Regarding different functional forms, we will further check whether there is a diminishing return to IQ and if the relationship between wage and educ cannot be better described via an elasticity (log-log). lm(formula = I(log(wage)) ~ educ + exper + I(exper^2) + IQ + I(IQ^2) + nearc4, data = wage_data_filtered) Estimate Std. Error t value Pr(>|t|) (Intercept) 2.786e+00 2.473e-01 11.265 < 2e-16 *** educ 7.773e-02 3.750e-03 20.725 < 2e-16 *** exper 9.567e-02 6.896e-03 13.874 < 2e-16 *** I(exper^2) -2.778e-03 3.295e-04 -8.430 < 2e-16 *** IQ 3.060e-03 4.799e-04 6.376 2.09e-10 *** I(IQ^2) -1.233e-05 2.413e-05 -5.111 3.41e-07 *** nearc4 8.687e-02 1.533e-02 5.666 1.60e-08 *** Residual standard error: 0.386 on 3003 degrees of freedom Multiple R-squared: 0.2448, Adjusted R-squared: 0.2433 F-statistic: 162.3 on 6 and 3003 DF, p-value: < 2.2e-16 log(wage) ~ log(educ) + exper + exper^2 + IQ + nearc4 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.8523436 0.1219714 23.385 < 2e-16 *** I(log(educ)) 0.8388992 0.0435142 19.279 < 2e-16 *** exper 0.0777618 0.0068672 11.324 < 2e-16 *** I(exper^2) -0.0019325 0.0003374 -5.728 1.12e-08 *** IQ 0.0069262 0.0005550 12.481 < 2e-16 *** nearc4 0.0887111 0.0154835 5.729 1.11e-08 *** Residual standard error: 0.3898 on 3004 degrees of freedom Multiple R-squared: 0.2299, Adjusted R-squared: 0.2286 F-statistic: 179.4 on 5 and 3004 DF, p-value: < 2.2e-16 In both models, all coefficients are strongly significant, while only for the first model does the adjusted R2 increase. IQ thus seems to have rather diminishing returns similarly to exper, which 7 is intuitive, as higher-IQ people will, on average, tend to have higher income, but not necessarily and steadily linearly increasing with IQ. It can also be observed visually in Figure 1. On the other hand, a lower R2 of the second model supports our original semi-log functional relationship between wage and educ. Recall that two models can only be compared using R2 if they have the same dependent variable. 3. Add two intercept dummies: take the best model so far and add south and black as dummy variables. Black people were less educated in the 70s’. Also, the South was a less important region in the past. We do expect negative signs for both. log(wage) ~ educ + exper + exper^2 + IQ + IQ^2 + nearc4 + black + south Estimate Std. Error t value Pr(>|t|) (Intercept) 3.689e+00 2.608e-01 14.147 < 2e-16 *** educ 7.397e-02 3.686e-03 20.070 < 2e-16 *** exper 8.333e-02 6.781e-03 13.025 < 2e-16 *** I(exper^2) -2.483e-03 3.235e-04 -7.676 2.20e-14 *** IQ 1.917e-02 4.890e-03 3.921 9.02e-05 *** I(IQ^2) -8.172e-05 2.417e-05 -3.382 0.00073 *** nearc4 6.301e-02 1.529e-02 4.121 3.87e-05 *** black -1.217e-01 2.094e-02 -5.810 6.92e-09 *** south -1.313e-01 1.541e-02 -8.518 < 2e-16 *** Residual standard error: 0.3779 on 3001 degrees of freedom Multiple R-squared: 0.277, Adjusted R-squared: 0.2751 F-statistic: 143.7 on 8 and 3001 DF, p-value: < 2.2e-16 We expected these signs. The new coefficients are strongly significant, and the adjusted R2 improved. Being black and coming from the South was not economically beneficial at that time. Especially the estimated coefficients on IQ and nearc4 have changed markedly, which suggests another reduction of the omitted variable bias. The new variables will thus remain in the model. Let us additionally insert (black * educ) representing a slope dummy, i.e., an interaction term between a dummy and a quantitative variable. The motivation is that a significant parameter would indicate discrimination in the return to education. log(wage) ~ educ + exper + exper^2 + IQ + IQ^2 + nearc4 + black + south + black * educ Estimate Std. Error t value Pr(>|t|) (Intercept) 3.881e+00 2.675e-01 14.512 < 2e-16 *** educ 6.910e-02 3.993e-03 17.303 < 2e-16 *** exper 8.552e-02 6.830e-03 12.521 < 2e-16 *** I(exper^2) -2.321e-03 3.271e-04 -7.094 1.62e-12 *** IQ 1.669e-02 4.947e-03 3.374 0.000759 *** I(IQ^2) -6.867e-05 2.449e-05 -2.804 0.005073 ** nearc4 6.138e-02 1.528e-02 4.018 6.01e-05 *** black -8.247e-02 1.824e-02 -4.515 6.57e-06 *** south -1.306e-01 1.539e-02 -8.488 < 2e-16 *** I(black * educ) 2.024e-02 6.441e-03 3.143 0.001699 ** Residual standard error: 0.3773 on 3000 degrees of freedom Multiple R-squared: 0.2794, Adjusted R-squared: 0.2772 F-statistic: 129.2 on 9 and 3000 DF, p-value: < 2.2e-16 Adjusted R2 has slightly increased, and the new coefficient is statistically significant. The new interaction term has a positive effect, which is counterintuitive, but conversely, the estimate on educ has decreased from 0.074 to 0.069, similar to black from -0.12 to -0.37. Thus, the model 8 now captures the returns to education in greater detail and isolates the impact of educ alone, black alone, which is much stronger than previously, and the interaction impact of the two, which balances their drop. We now add (nearc4 * black) as an interaction term between two dummies for a potentially similar reason as before, but now w.r.t. the impact of the proximity to a 4-year college. log(wage) ~ educ + exper + exper^2 + IQ + IQ^2 + nearc4 + black + south + educ * black + nearc4 * black Estimate Std. Error t value Pr(>|t|) (Intercept) 3.868e+00 2.698e-01 14.337 < 2e-16 *** educ 6.903e-02 3.999e-03 17.261 < 2e-16 *** exper 8.558e-02 6.834e-03 12.524 < 2e-16 *** I(exper^2) -2.325e-03 3.274e-04 -7.101 1.54e-12 *** IQ 1.690e-02 4.981e-03 3.394 0.000699 *** I(IQ^2) -6.966e-05 2.463e-05 -2.828 0.004718 ** nearc4 6.463e-02 1.759e-02 3.674 0.000243 *** black -3.681e-01 8.328e-02 -4.420 1.02e-05 *** south -1.305e-01 1.540e-02 -8.472 < 2e-16 *** I(black * educ) 2.060e-02 6.514e-03 3.163 0.001579 ** I(nearc4 * black) -1.295e-02 3.479e-02 -0.372 0.709736 Residual standard error: 0.3774 on 2999 degrees of freedom Multiple R-squared: 0.2794, Adjusted R-squared: 0.277 F-statistic: 116.3 on 10 and 2999 DF, p-value: < 2.2e-16 In this setup, the coefficient of (nearc4 * black) is insignificant, estimated negative, and the adjusted R2 decreased. The only difference worth mentioning is related to a slight increase in the effect of nearc4. Considering the four important selection criteria, there is no strong intuition behind this new term; the change in nearc4 is due to multicollinearity, not an OVB reduction; and also, the technical criteria (adjusted R2 , t test) suggest this new dummy interaction term is not important for explaining wage differentials. The resulting model is thus the first one with (black * educ) only. The p-value for the null hypothesis of the overall insignificance of the regression is practically zero (< 2.2e − 16), clearly a rejection of the null (which is also obvious from individual t stats). 4. White test for heteroskedasticity: Let us use the model from the previous part, append residuals to the dataset, and specify the White test as follows. It is also accepted if you use Gretl directly for heteroskedasticity test. res^2 ~ educ + exper + exper^2 + IQ + IQ^2 + nearc4 + black + south + (black*educ)^2 + educ^2 + exper^2 + exper^4 + I(IQ^2) + I(IQ^4) + nearc4^2) + black^2 + south^2 + (educ*black)^2 We do not report the estimated model here because of its size, but some of its parameters appear strongly significant, e.g., on educ. The resulting p-value for the F test of the null hypothesis of joint insignificance is negligible (1.556e - 05), and similarly for the LM test (0.000175). We thus reject the null of homoskedasticity at any standard level. For a check of correct coding, the calculated p-value for the F test must be the same as the p-value for testing the overall insignificance of the regression that is directly reported with the estimated model. We should recalculate the robust standard errors: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.8815e+00 3.1682e-01 12.2513 < 2.2e-16 *** educ 6.9100e-02 4.1287e-03 16.7366 < 2.2e-16 *** exper 8.5516e-02 6.8208e-03 12.5376 < 2.2e-16 *** 9 I(exper^2) -2.3207e-03 3.2320e-04 -7.1805 8.718e-13 *** IQ 1.6689e-02 5.9075e-03 2.8250 0.004759 ** I(IQ^2) -6.8671e-05 2.8729e-05 -2.3903 0.016896 * nearc4 6.1381e-02 1.5065e-02 4.0743 4.735e-05 *** black -3.7235e-01 8.0756e-02 -4.6108 4.179e-06 *** south -1.3063e-01 1.5660e-02 -8.3419 < 2.2e-16 *** I(black * educ) 2.0241e-02 6.1768e-03 3.2770 0.001061 ** and compare them with the original model: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.881495e+00 2.674754e-01 14.511600 3.615748e-46 educ 6.909963e-02 3.993452e-03 17.303232 5.153031e-64 exper 8.551639e-02 6.830091e-03 12.520533 4.279323e-35 I(exper^2) -2.320718e-03 3.271440e-04 -7.093871 1.620077e-12 IQ 1.668900e-02 4.946838e-03 3.373638 7.512332e-04 I(IQ^2) -6.867056e-05 2.448669e-05 -2.804404 5.073420e-03 nearc4 6.138072e-02 1.527587e-02 4.018148 6.009867e-05 black -3.723512e-01 8.246550e-02 -4.515236 6.568315e-06 south -1.306338e-01 1.539129e-02 -8.487513 3.269500e-17 I(black * educ) 2.024143e-02 6.441038e-03 3.142559 1.691116e-03 We observe that for some variables, the robust standard errors are larger, and thus, the t statistics have decreased, especially in the cases of IQ and educ. This, however, does not threaten the specification of our model as we have a large sample size. Nevertheless, we indeed had a biased estimation of the error variance before. 10