Introductory Econometrics
Binary dependent variable
Suggested Solution
by Hieu Nguyen
Fall 2024
1.
Use the data in loanapp b.gdt for this exercise. The binary variable to be explained is approve, which
is equal to one if a mortgage loan to an individual was approved. The key explanatory variable is
whiteskin, a dummy variable equal to one if the applicant has light skin. The other applicants in the
data set are darkskin and Hispanic. To test for discrimination in the mortgage loan market, a LPM
can be used:
approve = β0 + β1whiteskin + other factors.
(a) Regress approve on whiteskin and report the results in the usual form. Interpret the estimated
coefficient on whiteskin. Is it significant? Is it practically large?
(b) As controls, add the variables hrat, obrat, loanprc, unem, male, married, dep, sch, cosign,
chist, pubrec, mortlat1, mortlat2, and vr. What happens to the estimated coefficient on
whiteskin? Is there still statistically significant evidence of discrimination against non-white
skin individuals?
(c) Estimate the equation in part (b) computing the White heteroskedasticity-consistent robust standard
errors. Compare the 95% confidence interval on βwhiteskin with the non-robust confidence
interval.
(d) Obtain the fitted values from the regression in part (c). Are any of them less than zero? Are any
of them greater than one?
(e) Estimate a Probit model of approve on whiteskin. Check the direction of the effect and the
statistical significance of whiteskin. Find the estimated probability of loan approval for both
whiteskin and non-white skin individuals. How do these compare with the LPM estimates?
(f) Now, add the variables hrat, obrat, loanprc, unem, male, married, dep, sch, cosign, chist,
pubrec, mortlat1, mortlat2, and vr. Is there still statistically significant evidence of discrimination
against non-white skin people? Interpret also other information from the Gretl output.
(g) Estimate the model from part (f) by Logit. Compare the estimated coefficient on whiteskin to
the Probit model.
(h) Estimate the sizes of the discrimination effects for Probit and Logit.
Solution:
(a) The benchmark LPM is:
Model 1: OLS, using observations 1{1989
Dependent variable: approve
Coefficient Std. Error t-ratio p-value
-------------------------------------------------
1
const 0.707792 0.0182393 38.8060 0.0000
whiteskin 0.200596 0.0198400 10.1107 0.0000
Mean dependent var 0.877325 S.D. dependent var 0.328146
Sum squared resid 203.5930 S.E. of regression 0.320098
R^2 0.048930 Adjusted R^2 0.048451
F (1, 1987) 102.2261 P-value(F ) 1.81e{23
Log-likelihood 555.5405 Akaike criterion 1115.081
Schwarz criterion 1126.272 Hannan{Quinn 1119.191
Based on the standard t-test and the estimated coefficient, whiteskin is statistically significant
at all standard levels. This elementary linear probability model (LPM) may suffer from various
econometric problems (e.g., many omitted variables and inherent heteroskedasticity). However, it
suggests a statistically as well as economically (practically large) significant effect of whiteskin:
by 20 percentage points higher probability of obtaining a mortgage loan.
(b) The model now becomes:
Model 2: OLS, using observations 1{1989 (n = 1971)
Missing or incomplete observations dropped: 18
Dependent variable: approve
Coefficient Std. Error t-ratio p-value
-------------------------------------------------
const 0.936731 0.0527354 17.7629 0.0000
whiteskin 0.128820 0.0197317 6.5286 0.0000
hrat 0.00183299 0.00126320 1.4511 0.1469
obrat 0.00543180 0.00110178 4.9300 0.0000
loanprc 0.147300 0.0375159 3.9263 0.0001
unem 0.00729893 0.00319799 2.2824 0.0226
male 0.00414414 0.0188644 0.2197 0.8261
married 0.0458241 0.0163077 2.8100 0.0050
dep 0.00682737 0.00670134 1.0188 0.3084
sch 0.00175251 0.0166498 0.1053 0.9162
cosign 0.00977222 0.0411394 0.2375 0.8123
chist 0.133027 0.0192627 6.9059 0.0000
pubrec 0.241927 0.0282274 8.5706 0.0000
mortlat1 0.0572511 0.0500120 1.1447 0.2525
mortlat2 0.113723 0.0669838 1.6978 0.0897
vr 0.0314408 0.0140313 2.2408 0.0252
Mean dependent var 0.876205 S.D. dependent var 0.329431
Sum squared resid 178.3935 S.E. of regression 0.302076
R^2 0.165582 Adjusted R^2 0.159180
F (15, 1955) 25.86339 P-value(F ) 1.84e{66
Log-likelihood 429.2569 Akaike criterion 890.5139
Schwarz criterion 979.8946 Hannan{Quinn 923.3569
The discrimination in the mortgage loan market remains statistically and economically significant.
Even after controlling for many other effects, whiteskin is associated with almost 13 percentage
points statistically significantly higher probability of obtaining a mortgage loan.
(c) Inherent heteroskedasticity (because of the nature of the binary dependent variable, check, e.g.,
the White test) is remedied using the White heteroskedasticity-consistent robust standard errors:
Model 3: OLS, using observations 1{1989 (n = 1971)
2
Missing or incomplete observations dropped: 18
Dependent variable: approve
Heteroskedasticity-robust standard errors, variant HC1
Coefficient Std. Error t-ratio p-value
-------------------------------------------------
const 0.936731 0.0593886 15.7729 0.0000
whiteskin 0.128820 0.0258693 4.9796 0.0000
hrat 0.00183299 0.00146703 1.2495 0.2116
obrat 0.00543180 0.00133099 4.0810 0.0000
loanprc 0.147300 0.0378351 3.8932 0.0001
unem 0.00729893 0.00371219 1.9662 0.0494
male 0.00414414 0.0193044 0.2147 0.8300
married 0.0458241 0.0172374 2.6584 0.0079
dep 0.00682737 0.00690380 0.9889 0.3228
sch 0.00175251 0.0171460 0.1022 0.9186
cosign 0.00977222 0.0395825 0.2469 0.8050
chist 0.133027 0.0246202 5.4031 0.0000
pubrec 0.241927 0.0427922 5.6535 0.0000
mortlat1 0.0572511 0.0662234 0.8645 0.3874
mortlat2 0.113723 0.0910697 1.2488 0.2119
vr 0.0314408 0.0144855 2.1705 0.0301
Mean dependent var 0.876205 S.D. dependent var 0.329431
Sum squared resid 178.3935 S.E. of regression 0.302076
R^2 0.165582 Adjusted R^2 0.159180
F (15, 1955) 14.97726 P-value(F ) 4.04e{37
Log-likelihood 429.2569 Akaike criterion 890.5139
Schwarz criterion 979.8946 Hannan{Quinn 923.3569
Provided we are sure how to compute confidence intervals manually, we can obtain them directly
in the Gretl from Model 3 menu: Analysis—Confidence intervals. For Model 3 with robust SEs,
we get:
t(1955, 0.025) = 1.961
VARIABLE COEFFICIENT 95% CONFIDENCE INTERVAL
whiteskin 0.128820 (0.0780852, 0.179554)
Compared to Model 2 (with smaller non-robust SEs):
t(1955, 0.025) = 1.961
VARIABLE COEFFICIENT 95% CONFIDENCE INTERVAL
whiteskin 0.128820 (0.0901223, 0.167517)
(d) In the Gretl Model 3 menu follow Graphs—Fitted, actual plot—By obs. number:
3
We observe that there are many fitted values above 1 (but only for whiteskin); however, none
below 0 (this could have been, to some extent, expected from the mean of approve = 0.88). Two
other optional graphical depictions of fitted values follow.
(e) The benchmark Probit model is (in the Gretl menu follow Model—Limited dependent variable—Probit—Binary...
and tick Show p-values):
Model 4: Probit, using observations 1{1989
Dependent variable: approve
Standard errors based on Hessian
Coefficient Std. Error z p-value
-------------------------------------------------
const 0.546946 0.0754350 7.2506 0.0000
whiteskin 0.783946 0.0867118 9.0408 0.0000
Mean dependent var 0.877325 S.D. dependent var 0.328146
McFadden R2 0.053312 Adjusted R2 0.050610
Log-likelihood 700.8774 Akaike criterion 1405.755
Schwarz criterion 1416.946 Hannan{Quinn 1409.865
4
Directions of effects of individual explanatory variables, as well as their statistical significance, can
be interpreted from the output directly in a similar way as for the OLS output. However, for the
magnitude of the effects or fitted/predicted values, it is important to consider also the standard
normal CDF:
• Estimated/fitted probability for whiteskin individuals:
ˆpi = F(β0+β1xi1+...+βkxik) = Φ(β0+β1xi,whiteskin=1) = Φ(0.547+0.784·1) = Φ(1.331) ≈ 0.9082 ≈ 91%;
• Estimated/fitted probability for non-white skin individuals:
ˆpi = F(β0+β1xi1+...+βkxik) = Φ(β0+β1xi,whiteskin=0) = Φ(0.547+0.784·0) = Φ(0.547) ≈ 0.7088 ≈ 71%.
(f) The model now becomes:
Model 5: Probit, using observations 2{1989 (n = 1971)
Missing or incomplete observations dropped: 17
Dependent variable: approve
Standard errors based on Hessian
Coefficient Std. Error z p-value
-------------------------------------------------
const 2.06233 0.313176 6.5852 0.0000
whiteskin 0.520253 0.0969588 5.3657 0.0000
hrat 0.00787633 0.00696162 1.1314 0.2579
obrat 0.0276924 0.00604930 4.5778 0.0000
loanprc 1.01197 0.237240 4.2656 0.0000
unem 0.0366849 0.0174807 2.0986 0.0359
male 0.0370014 0.109927 0.3366 0.7364
married 0.265747 0.0942523 2.8195 0.0048
dep 0.0495756 0.0390573 1.2693 0.2043
sch 0.0146497 0.0958421 0.1529 0.8785
cosign 0.0860713 0.245751 0.3502 0.7262
chist 0.585281 0.0959715 6.0985 0.0000
pubrec 0.778741 0.126320 6.1648 0.0000
mortlat1 0.187624 0.253113 0.7413 0.4585
mortlat2 0.494356 0.326556 1.5138 0.1301
vr 0.201062 0.0814934 2.4672 0.0136
Mean dependent var 0.876205 S.D. dependent var 0.329431
McFadden R2 0.186602 Adjusted R2 0.164921
Log-likelihood 600.2710 Akaike criterion 1232.542
Schwarz criterion 1321.923 Hannan{Quinn 1265.385
Based on the Z-test on whiteskin, we still observe statistically significant discrimination on the
mortgage loan market. Interpretation of the Probit output (McFadden R2, Log-likelihood, percent
correctly predicted, LR test) was discussed in detail during the seminar.
Comparable graphical depictions (+ one extra) of fitted values for Probit (do observe differences
to the LPM fitted values):
5
(g) Estimated Logit model:
Model 6: Logit, using observations 2{1989 (n = 1971)
Missing or incomplete observations dropped: 17
Dependent variable: approve
Standard errors based on Hessian
Coefficient Std. Error z p-value
-------------------------------------------------
const 3.80171 0.594707 6.3926 0.0000
whiteskin 0.937764 0.172904 5.4236 0.0000
hrat 0.0132631 0.0128802 1.0297 0.3031
obrat 0.0530338 0.0112803 4.7015 0.0000
loanprc 1.90495 0.460443 4.1372 0.0000
unem 0.0665789 0.0328086 2.0293 0.0424
male 0.0663851 0.206429 0.3216 0.7478
married 0.503282 0.177998 2.8275 0.0047
dep 0.0907335 0.0733342 1.2373 0.2160
sch 0.0412288 0.178404 0.2311 0.8172
cosign 0.132059 0.446094 0.2960 0.7672
chist 1.06658 0.171212 6.2296 0.0000
pubrec 1.34067 0.217366 6.1678 0.0000
mortlat1 0.309882 0.463520 0.6685 0.5038
mortlat2 0.894675 0.568581 1.5735 0.1156
vr 0.349828 0.153725 2.2757 0.0229
Mean dependent var 0.876205 S.D. dependent var 0.329431
McFadden R2 0.186297 Adjusted R2 0.164616
Log-likelihood 600.4962 Akaike criterion 1232.992
Schwarz criterion 1322.373 Hannan{Quinn 1265.835
Comparison Logit vs Probit vs LPM (to compute average marginal effects for Logit and Probit,
tick Show slopes at mean):
6
Model 6: Logit, using observations 2{1989 (n = 1971)
Missing or incomplete observations dropped: 17
Dependent variable: approve
Standard errors based on Hessian
Coefficient Std. Error z Slope
-------------------------------------------------
const 3.80171 0.594707 6.3926
whiteskin 0.937764 0.172904 5.4236 0.0967431
hrat 0.0132631 0.0128802 1.0297 0.00104057
. . .
Model 5: Probit, using observations 2{1989 (n = 1971)
Missing or incomplete observations dropped: 17
Dependent variable: approve
Standard errors based on Hessian
Coefficient Std. Error z Slope
-------------------------------------------------
const 2.06233 0.313176 6.5852
whiteskin 0.520253 0.0969588 5.3657 0.105747
hrat 0.00787633 0.00696162 1.1314 0.00127210
. . .
Evaluated at the mean
Model 3: OLS, using observations 1{1989 (n = 1971)
Missing or incomplete observations dropped: 18
Dependent variable: approve
Heteroskedasticity-robust standard errors, variant HC1
Coefficient Std. Error t-ratio p-value
-------------------------------------------------
const 0.936731 0.0593886 15.7729 0.0000
whiteskin 0.128820 0.0258693 4.9796 0.0000
hrat 0.00183299 0.00146703 1.2495 0.2116
. . .
Estimated coefficients are generally different and cannot be interpreted directly. However, we may
use several rules of thumb to quickly and roughly compare the Logit, Probit, and LPM estimates:
• We can multiply the Probit estimates by 0.4/0.25 = 1.6, or we can multiply the Logit estimates
by 0.25/0.4 = 0.625 to make them roughly comparable;
• We can multiply Probit estimates by 0.4 and Logit estimates by 0.25 to make them roughly
comparable to the LPM estimates.
(h) Compare average marginal effects (computed at the means of all explanatory variables) between
Logit and Probit (‘Slopes’ from the Gretl output) in the previous exercise. They are largely similar
but slightly: β1,whiteskin > Slopewhiteskin,Probit > Slopewhiteskin,Logit. This could have been expected
given the different shapes of the linear vs standard normal CDF vs logistic CDF (because of the
fatter tails of the logistic PDF, logistic CDF is positioned a bit below standard normal CDF for
x > 0).
7