Introductory Econometrics
Home Assignment 1
Suggested Solution
by Hieu Nguyen
Fall 2024
Solution of the assignment is to be delivered electronically by DATE 23:59:59
the latest. Late submissions will not be accepted, resulting in zero points.
Form teams of two people, please. Only one team member is supposed to
submit the solution with both team members’ names and email addresses on
the first page of the document. Teams are required to work independently, and
any form of plagiarism will be treated accordingly. Please understand that the
main advantage of teamwork is the synergy from solving the problems together
and the possibility to share and discuss your econometric knowledge with your
teammate. It is not about a pure division of tasks. So, please, do cooperate and
make sure you both understand all solutions completely.
The text itself can be written in any software of your choice (MS Word,
LaTeX, Pages etc.), but the .pdf format [5 MB max, .xls(x) can be attached in
.zip] of the final document is required.
Please, name the file Surname1 Surname2 HA01.pdf.
In your report, please, be clear and reasonably concise, but do explain all
essential steps (e.g., important matrices) of your solution/reasoning. Keep in
mind that not only the correctness of your answers and interpretations is assessed,
but also the text-editing quality is an integral part of your output.
Fingers crossed!
Hieu Nguyen
1
Problem 1: Test scores
(2 points, 0.5pt each)
A nationwide test score has a mean of 63 points and a variance of 121.
1. Convert the following raw scores to standardized Z values: 52, 91.
2. What raw scores correspond to standardized values Z = 2 and Z = −1.5?
3. Assuming that the test score is normally distributed, what is the probability
that a randomly selected individual who participated in the test has
obtained a test score higher or equal to 41?
4. Assuming normality again, what is the probability that a randomly selected
individual has obtained a test score between 55 and 75?
Solution:
We employ formulas from the lecture #1 slides for the standardization of a
random variable, for the probability computational rule, and the definition of
the CDF:
X ∼ N(µ, σ2
) =⇒ Z =
X − µ
σ
∼ N(0, 1),
P(X > x) = 1 − P(X ≤ x).
We then compute:
1.
Z52 =
52 − µ
σ
=
52 − 63
√
121
=
−11
11
= −1
(note that the variance σ2
= 121 so the standard deviation σ = 11);
similarly:
Z91 =
91 − 63
11
≈ 2.55.
2.
Z = 2 =
X − µ
σ
=
X − 63
11
=⇒ X = 85;
similarly:
Z = −1.5 =⇒ X = 46.5.
3.
P(55 ≤ X ≤ 75) = P(X ≤ 75) − P(X ≤ 55) =
P
X − 63
11
≤
75 − 63
11
− P
X − 63
11
≤
55 − 63
11
= P(Z75 ≤ 1.09) − P(Z55 ≤ −0.73) =
FZ(1.09) − FZ(−0.73) ≈ 0.8621 − 0.2327 = 62.9%
2
Problem 2: Modeling demand for beer consump-
tion
(8 points)
Let us consider a simple regression model to explain the demand for beer.
From the theory of consumer choice in microeconomics, we know that the demand
for goods also depends on income. We will thus focus on this trivialized
linear relationship. The data file HomeAssignment 01 Problem2data.xlsx contains
a data sample of 30 observations of annual beer consumption (in liters) and
annual income (in USD thousands) collected from randomly selected households.
Answer the following questions. Make sure you show all the matrices you
construct and compute in your solution.
1. (1 pt) Formulate the econometric model. Using the OLS formula, estimate
the intercept and slope parameters. Show the estimated model equation.
2. (1 pt) Interpret the meaning of the estimated coefficient of income. Does
the direction of the income effect follow your economic intuition? In case
it does not, provide a possible explanation(s).
3. (1 pt) Does the estimated intercept make sense in this situation? If yes,
provide your economic interpretation. If not, explain why
4. (1 pt) Find and list the model’s estimated/fitted values and residuals. Do
the sum of the residuals up to zero?
5. (0.5 pt) Predict the beer consumption for households with an annual income
of USD 60,000 and with USD 30,000.
6. (1pt) Consider carefully the Classical Assumptions step-by-step. Which
of them are likely to be violated? Explain your reasoning properly.
7. (1 pt) What are the consequences in case some specific Classical Assumptions
are violated? Think mainly about OLS properties (unbiasedness,
consistency, efficiency).
8. (1 pt) Comment on the overall results of your analysis. Does the model
suggest a realistic relationship between beer consumption and households’
income?
Attached: HomeAssignment 01 Problem2data.xlsx
Solution:
1. We are asked to analyze the influence of income on beer consumption.
Since beer can be considered the normal good the demand for which reflects
a direct relationship with a consumer’s income, we might expect a
3
positive relation:
beer consumption = f(income).
This relationship can be presented in a simple linear regression model
form:
beer consumption = β0 + β1income + ϵ,
where we expect β1 > 0.
Let us rewrite the model and the data in matrix form:
y = Xβ + ϵ,
where:
y =
























































81.7
56.9
49.9
64.1
65.4
51.7
64.1
58.1
46.3
61.7
65.3
57.8
63.5
50
65.9
46.8
48.3
55.6
53.8
47.9
57
51.6
51.6
54.2
57.7
51.7
44.3
55.9
52.1
52.5
























































, X =
























































1 35.1
1 36.6
1 51.6
1 35.5
1 37.2
1 48.4
1 37.2
1 37.6
1 48.4
1 38.2
1 39.4
1 38.7
1 40.0
1 46.7
1 40.5
1 48.8
1 40.4
1 41.1
1 48.1
1 41.1
1 42.5
1 46.7
1 42.4
1 43.4
1 47.3
1 43.9
1 45.9
1 44.5
1 46.0
1 44.8
























































, β =
β0
β1
.
To find the OLS estimates, we compute the following matrices:
X′
X =
30 1278
1278 55040.06
,
4
(X′
X)−1
=
3.0718 −0.0713
−0.0713 0.0017
,
X′
y =
1683.40
70973.82
,
which gives us the following:
ˆβ =
ˆβ0
ˆβ1
=
108.82
−1.24
,
and thus the estimated model is:
beer consumption = 108.82 − 1.24income.
Please note that the different rounding might lead to slightly different
numerical results in the following questions (which is not considered a
mistake, of course).
2. Interpretation: one more USD thousand of annual income is associated
with a decrease in beer consumption by 1.24 liter. The negative direction
of the effect surprisingly does not follow our basic economic intuition outlined
above. The are two main possible explanations. First, maybe our
first impression above was not entirely theoretically correct, and beer is
instead an example of an inferior good: with increasing income, people
might reduce beer consumption (maybe for health reasons and awareness
of the negative aspects of alcoholic beverages or because, on the other
hand, they switch to higher quality and more expensive alcoholic drinks).
Second, since our model is highly trivialized, the estimated effect can be
considerably biased because of incorrect model specifications: other important
right-hand side (RHS) variables likely correlated with income are
missing/omitted in the equation (price, price of substitutes). If the bias is
strong, it can even switch the direction of the estimated effect. Also, the
underlying consumption function may be nonlinear in variables, while we
estimate a purely linear consumption function. All these effects end up
in the stochastic error term and might (very likely in this case) lead to a
violation of the Classical Assumption of zero conditional mean.
3. Rather not. One can simply see that the cloud of data is very far away
from zero and even the smallest annual incomes exceed 35 thousands.
Zero income is thus rather a theoretical extreme far form reality (a potential
idea of homeless households tending to consume large volumes of
beer makes some sense but these data are likely unavailable due to the
nonresponse bias). Potential additional impacts captured by the intercept
hinder its interpretation oven more: it also absorbs a possible nonzero
mean of the error term (see random sampling assumption) and a potential
constant impact of any specification errors (e.g., omitted explanatory
variables, will learn later)
5
4. The fitted values and residuals (ei = beer consumptioni−beer consumptioni)
are:
beer consumption =
























































65.4
63.5
45.0
64.9
62.8
48.9
62.8
62.3
48.9
61.6
60.1
60.9
59.3
51.0
58.7
48.4
58.8
58.0
49.3
58.0
56.2
51.0
56.4
55.1
50.3
54.5
52.0
53.8
51.9
53.4
























































, e =
























































16.3
−6.6
4.9
−0.8
2.6
2.8
1.3
−4.2
−2.6
0.1
5.2
−3.1
4.2
−1.0
7.2
−1.6
−10.5
−2.4
4.5
−10.1
0.8
0.6
−4.8
−0.9
7.4
−2.8
−7.7
2.1
0.2
−0.9
























































The residuals indeed sum up 0.00.
5. Predictions:
beer consumptionincome=30 = 108.82 − 1.24 · 30 ≈ 71.6,
beer consumptionincome=60 = 108.82 − 1.24 · 60 ≈ 34.4.
6. • Linearity: From the theory of consumer choice in microeconomics,
we also know that the demand for goods does not only depend on
income but also on price and prices of other goods in the economy—particularly
substitutes (wine, liquors, etc.) and possibly also
6
complements (water served in restaurants, non-alcoholic beverages,
etc.). being most likely violated because the model is too simple,
i.e., not correctly specified in terms of RHS explanatory variables.
Also, the functional form linear in variables might not be completely
correct (but this will be discussed in other lectures later).
• Random sampling: Not violated because the sample is collected
from randomly selected households.
• Zero conditional mean: Most likely violated as essential RHS
variables omitted from the equation (price, price of substitutes) are
most likely correlated to the overall income level of the population.
All these effects end up in the stochastic error term, which leads to a
correlation between the error and the income variable. Intuitively, the
OLS estimator then incorrectly assigns to the included explanatory
variables parts of the effects of the omitted variables (to the extent
of how strongly they are correlated).
• Homoskedasticity: Perhaps violated since the differences in behaviors
of high-income and low-income consumers.
• No perfect collinearity: Cannot be violated because we only
have one explanatory variable.
• Normality of the error term: The error term can be considered
a cumulation of many additional influences that are not captured in
the model, including only the income variable. Thus, this assumption
is likely not violated.
7. Based on the last part, the violation of zero conditional mean leads to a
biased and inconsistent OLS estimator. On the other hand, as we do not
observe an indication of a violation of homoskedasticity, efficiency of the
OLS estimator does not seem affected.
8. The analysis suggests a negative impact of increasing annual income on
the beer consumption among households and can be expressed by the
estimated model equation:
beer consumption = 108.82 − 1.24income.
This result goes against our basic economic intuition. Still, it can be
explained by the potential inferiority of beer or as a result of the biased
and inconsistent OLS estimator of this relationship because our model is
highly trivialized. Especially other important RHS variables are missing
from the equation, such as the price or price of substitutes, that are most
likely correlated with the income variable. This most likely leads to a
violation of zero conditional mean.
7