LECTURE 3
1 / 36
Introduction to Econometrics
INTRODUCTION TO LINEAR  REGRESSION ANALYSIS II
Dali Laxton
October 7, 2022

REVISION: THE PREVIOUS LECTURE
2 / 36
e (Desired) properties of an estimator:
§An estimator is unbiased if the mean of its distribution is  equal to the value of the parameter
it is estimating
§
§An estimator is consistent if it converges to the value of the  true parameter as the sample size
increases
§
§An estimator is efficient if the variance of its sampling  distribution is the smallest possible

REVISION: THE PREVIOUS LECTURE
e We explained the principle of OLS estimator: minimizing  the sum of squared differences between
the observation  and the regression line yi = β0 + β1xi + εi
e We found the formulae for the estimates:
3 / 36

REVISION: THE PREVIOUS LECTURE
4 / 36
e We explained that the stochastic error term must be  present in a regression equation because of:
1.omission of many minor influences (unavailable data)
2.measurement error
3.possibly incorrect functional form
4.stochastic character of unpredictable human behavior
e Remember that all of these factors are included in the error  term and may alter its properties
e The properties of the error term determine the properties  of the estimates

WARM-UP EXERCISE
5 / 36
e You receive a unique dataset that includes wages of all  citizens of Brno as well as their
experience (number of  years spent working). Obviously, you are very curious  about what is the
effect of experience on wages.
e You run an OLS regression of monthly wage in CZK on the  number of years of experience and obtain
the following  results:
1.Interpret the meaning of the coefficient of experi.
2.Use the estimates to determine the average wage of a  person with 1, 5, 20, and 40 years of
experience.
3.Do the predicted wages seem realistic? Explain your  answer.

ON TODAY’S LECTURE
6 / 36
e We will derive estimation formulas for multivariate    OLS
e We will list the assumptions about the error term and the  explanatory variables that are
required in classical  regression models
e We will show that under these assumptions, OLS is the  best estimator available for regression
models
e The rest of the course will mostly deal in one way or  another with the question what to do when
one of the  classical assumptions is not met
e Readings:
Studenmund - chapter 4
Wooldridge - chapters 5, 8, 9, 12

ORDINARY LEAST SQUARES WITH SEVERAL  EXPLANATORY VARIABLES
7 / 36
e Usually, there are more than one explanatory variables in  regression models
e Multivariate model with k explanatory variables:
yi = β0 + β1xi1 + β2xi2 + . . . + βkxik + εi
e For observations 1, 2, . . . , n, we have:
y1 = β0 + β1x11 + β2x12 + . . . + βkx1k + ε1
y2 = β0 + β1x21 + β2x22 + . . . + βkx2k + ε2
.
.
. .
yn = β0 + β1xn1 + β2xn2 + . . . + βkxnk + εn

MATRIX NOTATION
8 / 36
e We can write in matrix   form:
or in a simplified notation:
Y = Xβ + ε
k
k

OLS - DERIVATION UNDER MATRIX  NOTATION(OPTIONAL)


MEANING OF REGRESSION COEFFICIENT
10 / 36
e Consider the multivariate     model
Q = β0 + β1P + β2Ps + β3Y + ε
^
s
estimated as Q = 31.50 − 0.73P + 0.11P + 0.23Y
Q . . . quantity demanded
P . . . commodity’s price
Ps . . . price of substitute
Y . . . disposable income
e Meaning of β1 is the impact of a one unit increase in P on  the dependent variable Q, holding
constant the other  included independent variables Ps and Y
e When price increases by 1 unit (and price of a substitute  good and income remain the same),
quantity demanded  decreases by 0.73 units

EXERCISE
11 / 36
e Remember the unique dataset that includes wages of all  citizens of Brno as well as their
experience (number of  years spent working).
e Because you realize that wages may not be linearly  dependent on experience, you add an
additional variable  exper2i into your model and you obtain the following  results:
wagei = 14450 + 1160 · experi − 25 · exper2i
1.What is the overall impact of increasing the number of  years of experience by 1 year?
2.Use the estimates to determine the average wage of a  person with 1, 5, 20, and 40 years of
experience.
3.Do the predicted wages seem realistic now? Explain your  answer.
^

THE CLASSICAL ASSUMPTIONS
12 / 36
1.Linearity: the regression model is linear in the parameters  (coefficients)
2.Random sampling: the data is a random sample drawn  from the population and each data point
follows the  population equation
3.No perfect collinearity: the values of explanatory variables  are not all the same and no
explanatory variable is a  perfect linear function of any other explanatory variable(s)
4.Zero conditional mean: values of explanatory variables  must contain no information about the
mean of the  unobserved factors - explanatory variables are  uncorrelated with the error term
5.Homoskedasticity: the error term has a constant variance
6.Normality of the error term: the error term is normally  distributed

1. LINEARITY IN PARAMETERS
13 / 36
The regression model is linear in coefficients.
e Linearity in variables is not required
e Example: production function Y = AKβ1 Lβ2 for which  we suppose A = expβ0+ε can be transformed so
that
ln Y = β0 + β1 ln K + β2 ln L + ε
and the linearity in coefficients is restored
e Note that it is the linearity in coefficients that allows us to  rewrite the general regression
model in matrix form

EXERCISE
Which of the following models is/are linear?
14 / 36

EXERCISE
Which of the following models is/are linear?
15 / 36

2. RANDOM SAMPLING
16 / 36
The data is a random sample drawn from the population and each  data point follows the population
equation.
e Discussion during last  class

3. NO PERFECT COLLINEARITY
17 / 36
The values of explanatory variables are not all the same and no  explanatory variable is a perfect
linear function of any other  explanatory variable(s).
e If this condition does not hold, we talk     about
(multi)collinearity
e Multicollinearity can be perfect or  imperfect
e Perfect multicollinearity: one explanatory variable is an  exact linear function of one or more
other explanatory  variables
§In this case, the OLS model is incapable to distinguish one  variable from the other
§OLS estimation cannot be conducted
§Example: we include dummy variables for men and  women together with the intercept

Mathematically, collinearity causes the determinant of the matrix to be singular (0), therefore,
you cannot do inversion beta=x’x^-1x’y

3. NO PERFECT COLLINEARITY
18 / 36
e Imperfect multicollinearity:
There is a linear relationship between the variables, but  there is some error in that relationship
Example: we include two variables that proxy for  individual health status
e Consequences of multicollinearity:
Estimated coefficients remain unbiased
But the standard errors of estimates are inflated - making  the variable insignificant even though
they might be  significant
e Solution: drop one of the variables

EXERCISE
19 / 36
e Which of the following pairs of independent variables  would violate the Assumption of no
multicollinearity?  (That is, which pairs of variables are perfect linear  functions of each
other?)
§right shoe size and left shoe size (of students in the class)
§consumption and disposable income (in the United States  over the last 30 years)
§Xi and 2Xi
§Xi and (Xi)2

4. BEFORE ZERO CONDITIONAL MEAN
20 / 36
The error term has a zero population mean.
e Notation: E[εi] = 0 or E[ε] = 0
e Idea: observations are distributed around the regression  line, the average of deviations is zero
e On average, we make no ”mistakes”
e This assumption is satisfied as long as there is an   intercept
included in the equation

Intercept matters because y=beta0+beta1*x+epsilon; E(Y)= beta0+beta1*E(X)+ E(epsilon). And
Ey=y-bar, Ex=x-bar => we know that in optimization y-bar=beta0+beta1*x-bar => E(epsilon)=0

4. ZERO CONDITIONAL MEAN
21 / 36
All explanatory variables are uncorrelated with the error term.
e  Notation:   E[xiεi] = 0 or E[Xjε] = 0
e If an explanatory variable and the error term were  correlated with each other, the OLS estimates
would be  likely to attribute some of the variation in y to the x when it  actually came from the
error term
e Example: Impact of skipping classes on exam scores:
Motivated students are less likely to skip classes → negative correlation between skipped and error
term
e Leads to biased and inconsistent   estimates
e We will solve this problem using IV approach

5. HOMOSKEDASTICITY
22 / 36
The error term has a constant variance - Var(si|Xi) = σ2
e If it is not satisfied, we talk about   heteroskedasticity
e It states that each observation of the error is drawn from a  distribution with the same variance
and thus varies in the  same manner around the regression line
e If the error term is heteroskedastic, it is more difficult for  OLS to get precise estimates of
the coefficients of the  explanatory variables
e Technically: the OLS estimate will be consistent, but not  efficient

24 / 36
5. HOMOSKEDASTICITY - GRAPHICAL REPRESENTATION
x
Y

GRAPHICAL REPRESENTATION
25 / 36
x
Y

5. HOMOSKEDASTICITY
23 / 36
e Heteroskedasticity is often present in cross-sectional data
e Example: Analysis of household consumption patterns
Variance of the consumption of certain goods might be  greater for higher-income households
These have more discretionary income than do  lower-income households
e We will solve this problem using Hull-White robust  standard errors

GRAPHICAL REPRESENTATION


6. NORMALITY OF THE ERROR TERM
26 / 36
The error term is normally distributed.
e This is an empirical question
e
^
Normality of the error term is inherited by the estimate β
e Knowing the distribution of the estimate allows us to find  its confidence intervals and to test
hypotheses about  coefficients

PROPERTIES OF THE OLS ESTIMATE
27 / 36
e OLS estimate is defined by the formula
where y = Xβ + ε
e Hence, it is dependent on the random variable ε and thus
              is a random variable itself
e The properties of                                are based on the properties of ε

EXPECTED VALUE OF THE OLS ESTIMATOR
28 / 36
e Under the assumptions 1-4, OLS is unbiased:
e The estimated coefficients may be smaller or larger,  depending on the sample
e However, on average, they will be equal to the true  parameters
e NOTE: in a given sample, estimates may differ  considerably from true values

VARIANCE OF THE OLS ESTIMATOR
28 / 36
§Under the assumptions 1-5 , OLS is efficient :
§
§
§
§
§The error variance (σ2): increases the variance of an  estimator
§The variation in explanatory variable reduces the variance of the estimator
§

GAUSS-MARKOV THEOREM
30 / 36
Under the assumptions 1 - 5, the OLS estimator of β is the best linear  unbiased estimator (BLUE)
of the regression coefficients
e NOTE: assumption 6, normality, is not needed for this  theorem
e Gauss-Markov Theorem means that:

EXPECTED VALUE OF THE OLS ESTIMATE (OPTIONAL)
31 / 36


VARIANCE OF THE OLS ESTIMATE (OPTIONAL)
32 / 36


NORMALITY OF THE OLS ESTIMATE
33 / 36


CONSISTENCY OF THE OLS ESTIMATE
34 / 36
e When no explanatory variables are correlated with the  error term (Assumption 4), OLS estimate is
consistent:
e In other words: as the number of observations increases,  the estimate converges to the true
value of the coefficient

CONSISTENCY OF THE OLS ESTIMATE
35 / 36
e
^
As long as the OLS estimate of β is consistent, the
residuals are consistent estimates of the error term
e If we have consistent estimates of the error term, we can  test if it satisfies the classical
assumptions
e Moreover, possible deviations from the classical model can  be corrected
e As a consequence, the assumption of zero correlation  between explanatory variables and the error
term
is the most important one to satisfy in regression models

SUMMARY
36 / 36
e We expressed the multivariate OLS model in matrix  notation y = Xβ + ε and we found the formula
of the  estimate:
e We listed the classical assumptions of regression models:
§model linear in parameters, random sampling, explanatory  variables linearly independent
§(normally distributed) error term with zero mean and  constant variance
§no correlation between error term and explanatory  variables
e We showed that if these assumptions hold, OLS estimate is
§consistent (if no correlation between X and ε)
§unbiased (if no correlation between X and ε)
§efficient (if homoskedasticity and no autocorrelation of ε)
§normally distributed (if ε normally distributed)