LECTURE 3 Introduction to Econometrics INTRODUCTION TO LINEAR REGRESSION ANALYSIS II October 6, 2017 1 / 38 REVISION: THE PREVIOUS LECTURE 2 / 38 REVISION: THE PREVIOUS LECTURE (Desired) properties of an estimator: An estimator is unbiased if the mean of its distribution is equal to the value of the parameter it is estimating 2 / 38 REVISION: THE PREVIOUS LECTURE (Desired) properties of an estimator: An estimator is unbiased if the mean of its distribution is equal to the value of the parameter it is estimating An estimator is consistent if it converges to the value of the true parameter as the sample size increases 2 / 38 REVISION: THE PREVIOUS LECTURE (Desired) properties of an estimator: An estimator is unbiased if the mean of its distribution is equal to the value of the parameter it is estimating An estimator is consistent if it converges to the value of the true parameter as the sample size increases An estimator is efficient if the variance of its sampling distribution is the smallest possible 2 / 38 REVISION: THE PREVIOUS LECTURE 3 / 38 REVISION: THE PREVIOUS LECTURE We explained the principle of OLS estimator: minimizing the sum of squared differences between the observation and the regression line yi = β0 + β1xi + εi 3 / 38 REVISION: THE PREVIOUS LECTURE We explained the principle of OLS estimator: minimizing the sum of squared differences between the observation and the regression line yi = β0 + β1xi + εi We found the formulae for the estimates: β1 = n i=1 (xi − xn) yi − yn n i=1 (xi − xn)2 β0 = yn − β1xn 3 / 38 REVISION: THE PREVIOUS LECTURE We explained that the stochastic error term must be present in a regression equation because of: 4 / 38 REVISION: THE PREVIOUS LECTURE We explained that the stochastic error term must be present in a regression equation because of: 1. omission of many minor influences (unavailable data) 4 / 38 REVISION: THE PREVIOUS LECTURE We explained that the stochastic error term must be present in a regression equation because of: 1. omission of many minor influences (unavailable data) 2. measurement error 4 / 38 REVISION: THE PREVIOUS LECTURE We explained that the stochastic error term must be present in a regression equation because of: 1. omission of many minor influences (unavailable data) 2. measurement error 3. possibly incorrect functional form 4 / 38 REVISION: THE PREVIOUS LECTURE We explained that the stochastic error term must be present in a regression equation because of: 1. omission of many minor influences (unavailable data) 2. measurement error 3. possibly incorrect functional form 4. stochastic character of unpredictable human behavior 4 / 38 REVISION: THE PREVIOUS LECTURE We explained that the stochastic error term must be present in a regression equation because of: 1. omission of many minor influences (unavailable data) 2. measurement error 3. possibly incorrect functional form 4. stochastic character of unpredictable human behavior Remember that all of these factors are included in the error term and may alter its properties 4 / 38 REVISION: THE PREVIOUS LECTURE We explained that the stochastic error term must be present in a regression equation because of: 1. omission of many minor influences (unavailable data) 2. measurement error 3. possibly incorrect functional form 4. stochastic character of unpredictable human behavior Remember that all of these factors are included in the error term and may alter its properties The properties of the error term determine the properties of the estimates 4 / 38 WARM-UP EXERCISE You receive a unique dataset that includes wages of all citizens of Brno as well as their experience (number of years spent working). Obviously, you are very curious about what is the effect of experience on wages. 5 / 38 WARM-UP EXERCISE You receive a unique dataset that includes wages of all citizens of Brno as well as their experience (number of years spent working). Obviously, you are very curious about what is the effect of experience on wages. You run an OLS regression of monthly wage in CZK on the number of years of experience and obtain the following results: wagei = 14450 + 1135 · experi 5 / 38 WARM-UP EXERCISE You receive a unique dataset that includes wages of all citizens of Brno as well as their experience (number of years spent working). Obviously, you are very curious about what is the effect of experience on wages. You run an OLS regression of monthly wage in CZK on the number of years of experience and obtain the following results: wagei = 14450 + 1135 · experi 1. Interpret the meaning of the coefficient of experi. 2. Use the estimates to determine the average wage of a person with 1, 5, 20, and 40 years of experience. 3. Do the predicted wages seem realistic? Explain your answer. 5 / 38 ON TODAY’S LECTURE 6 / 38 ON TODAY’S LECTURE We will derive estimation formulas for multivariate OLS 6 / 38 ON TODAY’S LECTURE We will derive estimation formulas for multivariate OLS We will list the assumptions about the error term and the explanatory variables that are required in classical regression models 6 / 38 ON TODAY’S LECTURE We will derive estimation formulas for multivariate OLS We will list the assumptions about the error term and the explanatory variables that are required in classical regression models We will show that under these assumptions, OLS is the best estimator available for regression models 6 / 38 ON TODAY’S LECTURE We will derive estimation formulas for multivariate OLS We will list the assumptions about the error term and the explanatory variables that are required in classical regression models We will show that under these assumptions, OLS is the best estimator available for regression models The rest of the course will mostly deal in one way or another with the question what to do when one of the classical assumptions is not met 6 / 38 ON TODAY’S LECTURE We will derive estimation formulas for multivariate OLS We will list the assumptions about the error term and the explanatory variables that are required in classical regression models We will show that under these assumptions, OLS is the best estimator available for regression models The rest of the course will mostly deal in one way or another with the question what to do when one of the classical assumptions is not met Readings: Studenmund - chapter 4 Wooldridge - chapters 5, 8, 9, 12 6 / 38 ORDINARY LEAST SQUARES WITH SEVERAL EXPLANATORY VARIABLES 7 / 38 ORDINARY LEAST SQUARES WITH SEVERAL EXPLANATORY VARIABLES Usually, there are more than one explanatory variables in regression models 7 / 38 ORDINARY LEAST SQUARES WITH SEVERAL EXPLANATORY VARIABLES Usually, there are more than one explanatory variables in regression models Multivariate model with k explanatory variables: yi = β0 + β1xi1 + β2xi2 + . . . + βkxik + εi 7 / 38 ORDINARY LEAST SQUARES WITH SEVERAL EXPLANATORY VARIABLES Usually, there are more than one explanatory variables in regression models Multivariate model with k explanatory variables: yi = β0 + β1xi1 + β2xi2 + . . . + βkxik + εi For observations 1, 2, . . . , n, we have: y1 = β0 + β1x11 + β2x12 + . . . + βkx1k + ε1 y2 = β0 + β1x21 + β2x22 + . . . + βkx2k + ε2 ... ... yn = β0 + β1xn1 + β2xn2 + . . . + βkxnk + εn 7 / 38 MATRIX NOTATION We can write in matrix form:      y1 y2 ... yn      =      1 x11 x12 · · · x1n 1 x21 x22 · · · x2n ... ... ... ... 1 xn1 xn2 · · · xnk             β0 β1 β2 ... βk        +      ε1 ε2 ... εn      8 / 38 MATRIX NOTATION We can write in matrix form:      y1 y2 ... yn      =      1 x11 x12 · · · x1n 1 x21 x22 · · · x2n ... ... ... ... 1 xn1 xn2 · · · xnk             β0 β1 β2 ... βk        +      ε1 ε2 ... εn      or in a simplified notation: y = Xβ + ε 8 / 38 OLS - DERIVATION UNDER MATRIX NOTATION We have to find β = argmin β (y − Xβ) (y − Xβ) = argmin β y y − y Xβ − β X y + β X Xβ 9 / 38 OLS - DERIVATION UNDER MATRIX NOTATION We have to find β = argmin β (y − Xβ) (y − Xβ) = argmin β y y − y Xβ − β X y + β X Xβ FOC: ∂ ∂β : − y X − X y + X Xβ + (X X) β = 0 X Xβ = X y 9 / 38 OLS - DERIVATION UNDER MATRIX NOTATION We have to find β = argmin β (y − Xβ) (y − Xβ) = argmin β y y − y Xβ − β X y + β X Xβ FOC: ∂ ∂β : − y X − X y + X Xβ + (X X) β = 0 X Xβ = X y This gives us β = X X −1 X y 9 / 38 MEANING OF REGRESSION COEFFICIENT 10 / 38 MEANING OF REGRESSION COEFFICIENT Consider the multivariate model Q = β0 + β1P + β2Ps + β3Y + ε estimated as Q = 31.50 − 0.73P + 0.11Ps + 0.23Y Q . . . quantity demanded P . . . commodity’s price Ps . . . price of substitute Y . . . disposable income 10 / 38 MEANING OF REGRESSION COEFFICIENT Consider the multivariate model Q = β0 + β1P + β2Ps + β3Y + ε estimated as Q = 31.50 − 0.73P + 0.11Ps + 0.23Y Q . . . quantity demanded P . . . commodity’s price Ps . . . price of substitute Y . . . disposable income Meaning of β1 is the impact of a one unit increase in P on the dependent variable Q, holding constant the other included independent variables Ps and Y 10 / 38 MEANING OF REGRESSION COEFFICIENT Consider the multivariate model Q = β0 + β1P + β2Ps + β3Y + ε estimated as Q = 31.50 − 0.73P + 0.11Ps + 0.23Y Q . . . quantity demanded P . . . commodity’s price Ps . . . price of substitute Y . . . disposable income Meaning of β1 is the impact of a one unit increase in P on the dependent variable Q, holding constant the other included independent variables Ps and Y When price increases by 1 unit (and price of substitute good and income remain the same), quantity demanded decreases by 0.73 units 10 / 38 EXERCISE Remember the unique dataset that includes wages of all citizens of Brno as well as their experience (number of years spent working). Because you realize that wages may not be linearly dependent on experience, you add an additional variable exper2 i into your model and you obtain the following results: 11 / 38 EXERCISE Remember the unique dataset that includes wages of all citizens of Brno as well as their experience (number of years spent working). Because you realize that wages may not be linearly dependent on experience, you add an additional variable exper2 i into your model and you obtain the following results: wagei = 14450 + 1160 · experi − 25 · exper2 i 11 / 38 EXERCISE Remember the unique dataset that includes wages of all citizens of Brno as well as their experience (number of years spent working). Because you realize that wages may not be linearly dependent on experience, you add an additional variable exper2 i into your model and you obtain the following results: wagei = 14450 + 1160 · experi − 25 · exper2 i 1. What is the overall impact of increasing the number of years of experience by 1 year? 11 / 38 EXERCISE Remember the unique dataset that includes wages of all citizens of Brno as well as their experience (number of years spent working). Because you realize that wages may not be linearly dependent on experience, you add an additional variable exper2 i into your model and you obtain the following results: wagei = 14450 + 1160 · experi − 25 · exper2 i 1. What is the overall impact of increasing the number of years of experience by 1 year? 2. Use the estimates to determine the average wage of a person with 1, 5, 20, and 40 years of experience. 11 / 38 EXERCISE Remember the unique dataset that includes wages of all citizens of Brno as well as their experience (number of years spent working). Because you realize that wages may not be linearly dependent on experience, you add an additional variable exper2 i into your model and you obtain the following results: wagei = 14450 + 1160 · experi − 25 · exper2 i 1. What is the overall impact of increasing the number of years of experience by 1 year? 2. Use the estimates to determine the average wage of a person with 1, 5, 20, and 40 years of experience. 3. Do the predicted wages seem realistic now? Explain your answer. 11 / 38 THE CLASSICAL ASSUMPTIONS 12 / 38 THE CLASSICAL ASSUMPTIONS 1. The regression model is linear in the coefficients, is correctly specified, and has an additive error term 12 / 38 THE CLASSICAL ASSUMPTIONS 1. The regression model is linear in the coefficients, is correctly specified, and has an additive error term 2. The error term has a zero population mean 12 / 38 THE CLASSICAL ASSUMPTIONS 1. The regression model is linear in the coefficients, is correctly specified, and has an additive error term 2. The error term has a zero population mean 3. Observations of the error term are uncorrelated with each other 12 / 38 THE CLASSICAL ASSUMPTIONS 1. The regression model is linear in the coefficients, is correctly specified, and has an additive error term 2. The error term has a zero population mean 3. Observations of the error term are uncorrelated with each other 4. The error term has a constant variance 12 / 38 THE CLASSICAL ASSUMPTIONS 1. The regression model is linear in the coefficients, is correctly specified, and has an additive error term 2. The error term has a zero population mean 3. Observations of the error term are uncorrelated with each other 4. The error term has a constant variance 5. All explanatory variables are uncorrelated with the error term 12 / 38 THE CLASSICAL ASSUMPTIONS 1. The regression model is linear in the coefficients, is correctly specified, and has an additive error term 2. The error term has a zero population mean 3. Observations of the error term are uncorrelated with each other 4. The error term has a constant variance 5. All explanatory variables are uncorrelated with the error term 6. No explanatory variable is a perfect linear function of any other explanatory variable(s) 12 / 38 THE CLASSICAL ASSUMPTIONS 1. The regression model is linear in the coefficients, is correctly specified, and has an additive error term 2. The error term has a zero population mean 3. Observations of the error term are uncorrelated with each other 4. The error term has a constant variance 5. All explanatory variables are uncorrelated with the error term 6. No explanatory variable is a perfect linear function of any other explanatory variable(s) 7. The error term is normally distributed 12 / 38 GRAPHICAL REPRESENTATION X Y 13 / 38 1. LINEARITY IN COEFFICIENTS The regression model is linear in the coefficients, is correctly specified, and has an additive error term. 14 / 38 1. LINEARITY IN COEFFICIENTS The regression model is linear in the coefficients, is correctly specified, and has an additive error term. Linearity in variables is not required 14 / 38 1. LINEARITY IN COEFFICIENTS The regression model is linear in the coefficients, is correctly specified, and has an additive error term. Linearity in variables is not required Example: production function Y = AKβ1 Lβ2 for which we suppose A = expβ0+ε can be transformed so that ln Y = β0 + β1 ln K + β2 ln L + ε and the linearity in coefficients is restored 14 / 38 1. LINEARITY IN COEFFICIENTS The regression model is linear in the coefficients, is correctly specified, and has an additive error term. Linearity in variables is not required Example: production function Y = AKβ1 Lβ2 for which we suppose A = expβ0+ε can be transformed so that ln Y = β0 + β1 ln K + β2 ln L + ε and the linearity in coefficients is restored Note that it is the linearity in coefficients that allows us to rewrite the general regression model in matrix form 14 / 38 EXERCISE Which of the following models is/are linear? y = β0 + β1x + ε ln y = β0 + β1 ln x + β2 √ z + ε y = xβ1 + ε 15 / 38 EXERCISE Which of the following models is/are linear? y = β0 + β1x + ε is a linear model ln y = β0 + β1 ln x + β2 √ z + ε is a linear model y = xβ1 + ε is NOT a linear model 16 / 38 EXERCISE Which of the following models is/are linear? y = β0 + β1x + ε is a linear model ln y = β0 + β1 ln x + β2 √ z + ε is a linear model y = xβ1 + ε is NOT a linear model Regression models are linear in parameters, but they do not need to be linear in variables 16 / 38 2. ZERO MEAN OF THE ERROR TERM The error term has a zero population mean. 17 / 38 2. ZERO MEAN OF THE ERROR TERM The error term has a zero population mean. Notation: E[εi] = 0 or E[ε] = 0 17 / 38 2. ZERO MEAN OF THE ERROR TERM The error term has a zero population mean. Notation: E[εi] = 0 or E[ε] = 0 Idea: observations are distributed around the regression line, the average of deviations is zero 17 / 38 2. ZERO MEAN OF THE ERROR TERM The error term has a zero population mean. Notation: E[εi] = 0 or E[ε] = 0 Idea: observations are distributed around the regression line, the average of deviations is zero In fact, the mean of εi is forced to be zero by the existence of the intercept (β0) in the equation 17 / 38 2. ZERO MEAN OF THE ERROR TERM The error term has a zero population mean. Notation: E[εi] = 0 or E[ε] = 0 Idea: observations are distributed around the regression line, the average of deviations is zero In fact, the mean of εi is forced to be zero by the existence of the intercept (β0) in the equation Hence, this assumption is satisfied as long as there is an intercept included in the equation 17 / 38 GRAPHICAL REPRESENTATION 18 / 38 3. ERRORS UNCORRELATED WITH EACH OTHER Observations of the error term are uncorrelated with each other. 19 / 38 3. ERRORS UNCORRELATED WITH EACH OTHER Observations of the error term are uncorrelated with each other. If there is a systematic correlation between one observation of the error term and another (serial correlation), it is more difficult for OLS to get precise estimates of the coefficients of the explanatory variables 19 / 38 3. ERRORS UNCORRELATED WITH EACH OTHER Observations of the error term are uncorrelated with each other. If there is a systematic correlation between one observation of the error term and another (serial correlation), it is more difficult for OLS to get precise estimates of the coefficients of the explanatory variables Technically: the OLS estimate will be consistent, but not efficient 19 / 38 3. ERRORS UNCORRELATED WITH EACH OTHER Observations of the error term are uncorrelated with each other. If there is a systematic correlation between one observation of the error term and another (serial correlation), it is more difficult for OLS to get precise estimates of the coefficients of the explanatory variables Technically: the OLS estimate will be consistent, but not efficient Often happens in time series data, where a random shock in one time period affects the random shock in another time period 19 / 38 3. ERRORS UNCORRELATED WITH EACH OTHER Observations of the error term are uncorrelated with each other. If there is a systematic correlation between one observation of the error term and another (serial correlation), it is more difficult for OLS to get precise estimates of the coefficients of the explanatory variables Technically: the OLS estimate will be consistent, but not efficient Often happens in time series data, where a random shock in one time period affects the random shock in another time period We will solve this problem using Generalized Least Squares estimator 19 / 38 GRAPHICAL REPRESENTATION X Y Estimated model True model 20 / 38 4. CONSTANT VARIANCE OF THE ERROR TERM The error term has a constant variance. 21 / 38 4. CONSTANT VARIANCE OF THE ERROR TERM The error term has a constant variance. This property is called homoskedasticity; if it is not satisfied, we talk about heteroskedasticity 21 / 38 4. CONSTANT VARIANCE OF THE ERROR TERM The error term has a constant variance. This property is called homoskedasticity; if it is not satisfied, we talk about heteroskedasticity It states that each observation of the error is drawn from a distribution with the same variance and thus varies in the same manner around the regression line 21 / 38 4. CONSTANT VARIANCE OF THE ERROR TERM The error term has a constant variance. This property is called homoskedasticity; if it is not satisfied, we talk about heteroskedasticity It states that each observation of the error is drawn from a distribution with the same variance and thus varies in the same manner around the regression line If the error term is heteroskedastic, it is more difficult for OLS to get precise estimates of the coefficients of the explanatory variables 21 / 38 4. CONSTANT VARIANCE OF THE ERROR TERM The error term has a constant variance. This property is called homoskedasticity; if it is not satisfied, we talk about heteroskedasticity It states that each observation of the error is drawn from a distribution with the same variance and thus varies in the same manner around the regression line If the error term is heteroskedastic, it is more difficult for OLS to get precise estimates of the coefficients of the explanatory variables Technically: the OLS estimate will be consistent, but not efficient 21 / 38 4. CONSTANT VARIANCE OF THE ERROR TERM Heteroskedasticity is often present in cross-sectional data 22 / 38 4. CONSTANT VARIANCE OF THE ERROR TERM Heteroskedasticity is often present in cross-sectional data Example: Analysis of household consumption patterns 22 / 38 4. CONSTANT VARIANCE OF THE ERROR TERM Heteroskedasticity is often present in cross-sectional data Example: Analysis of household consumption patterns Variance of the consumption of certain goods might be greater for higher-income households 22 / 38 4. CONSTANT VARIANCE OF THE ERROR TERM Heteroskedasticity is often present in cross-sectional data Example: Analysis of household consumption patterns Variance of the consumption of certain goods might be greater for higher-income households These have more discretionary income than do lower-income households 22 / 38 4. CONSTANT VARIANCE OF THE ERROR TERM Heteroskedasticity is often present in cross-sectional data Example: Analysis of household consumption patterns Variance of the consumption of certain goods might be greater for higher-income households These have more discretionary income than do lower-income households We will solve this problem using Hull-White robust standard errors 22 / 38 GRAPHICAL REPRESENTATION X Y True model Estimated model 23 / 38 3. NO CORRELATION + 4. HOMOSKEDASTICITY 24 / 38 3. NO CORRELATION + 4. HOMOSKEDASTICITY Notation: no correlation: corr(εiεj) ⇒ E[εiεj] = 0 for each i, j homoskedasticity: E[ε2 i ] = σ2 for each i Matrix notation: Var[ε] =        σ2 0 0 · · · 0 0 σ2 0 · · · 0 0 0 σ2 · · · 0 ... ... ... 0 0 0 · · · σ2        = σ2 I 24 / 38 5. VARIABLES UNCORRELATED WITH THE ERROR TERM All explanatory variables are uncorrelated with the error term. 25 / 38 5. VARIABLES UNCORRELATED WITH THE ERROR TERM All explanatory variables are uncorrelated with the error term. Notation: E[xiεi] = 0 or E[X ε] = 0 25 / 38 5. VARIABLES UNCORRELATED WITH THE ERROR TERM All explanatory variables are uncorrelated with the error term. Notation: E[xiεi] = 0 or E[X ε] = 0 If an explanatory variable and the error term were correlated with each other, the OLS estimates would be likely to attribute to the x some of the variation in y that actually came from the error term 25 / 38 5. VARIABLES UNCORRELATED WITH THE ERROR TERM All explanatory variables are uncorrelated with the error term. Notation: E[xiεi] = 0 or E[X ε] = 0 If an explanatory variable and the error term were correlated with each other, the OLS estimates would be likely to attribute to the x some of the variation in y that actually came from the error term Example: Analysis of household consumption patterns 25 / 38 5. VARIABLES UNCORRELATED WITH THE ERROR TERM All explanatory variables are uncorrelated with the error term. Notation: E[xiεi] = 0 or E[X ε] = 0 If an explanatory variable and the error term were correlated with each other, the OLS estimates would be likely to attribute to the x some of the variation in y that actually came from the error term Example: Analysis of household consumption patterns Households with lower incomes may indicate higher consumption (because of shame) Negative correlation between X and error term (measurement error higher for lower incomes) 25 / 38 5. VARIABLES UNCORRELATED WITH THE ERROR TERM All explanatory variables are uncorrelated with the error term. Notation: E[xiεi] = 0 or E[X ε] = 0 If an explanatory variable and the error term were correlated with each other, the OLS estimates would be likely to attribute to the x some of the variation in y that actually came from the error term Example: Analysis of household consumption patterns Households with lower incomes may indicate higher consumption (because of shame) Negative correlation between X and error term (measurement error higher for lower incomes) Leads to biased and inconsistent estimates 25 / 38 5. VARIABLES UNCORRELATED WITH THE ERROR TERM All explanatory variables are uncorrelated with the error term. Notation: E[xiεi] = 0 or E[X ε] = 0 If an explanatory variable and the error term were correlated with each other, the OLS estimates would be likely to attribute to the x some of the variation in y that actually came from the error term Example: Analysis of household consumption patterns Households with lower incomes may indicate higher consumption (because of shame) Negative correlation between X and error term (measurement error higher for lower incomes) Leads to biased and inconsistent estimates We will solve this problem using IV approach 25 / 38 GRAPHICAL REPRESENTATION X Y True model Estimated model 26 / 38 6. LINEARLY INDEPENDENT VARIABLES No explanatory variable is a perfect linear function of any other explanatory variable(s). 27 / 38 6. LINEARLY INDEPENDENT VARIABLES No explanatory variable is a perfect linear function of any other explanatory variable(s). If this condition does not hold, we talk about (multi)collinearity 27 / 38 6. LINEARLY INDEPENDENT VARIABLES No explanatory variable is a perfect linear function of any other explanatory variable(s). If this condition does not hold, we talk about (multi)collinearity Multicollinearity can be perfect of imperfect 27 / 38 6. LINEARLY INDEPENDENT VARIABLES No explanatory variable is a perfect linear function of any other explanatory variable(s). If this condition does not hold, we talk about (multi)collinearity Multicollinearity can be perfect of imperfect Perfect multicollinearity: one explanatory variable is an exact linear function of one or more other explanatory variables 27 / 38 6. LINEARLY INDEPENDENT VARIABLES No explanatory variable is a perfect linear function of any other explanatory variable(s). If this condition does not hold, we talk about (multi)collinearity Multicollinearity can be perfect of imperfect Perfect multicollinearity: one explanatory variable is an exact linear function of one or more other explanatory variables In this case, the OLS model is incapable to distinguish one variable from the other 27 / 38 6. LINEARLY INDEPENDENT VARIABLES No explanatory variable is a perfect linear function of any other explanatory variable(s). If this condition does not hold, we talk about (multi)collinearity Multicollinearity can be perfect of imperfect Perfect multicollinearity: one explanatory variable is an exact linear function of one or more other explanatory variables In this case, the OLS model is incapable to distinguish one variable from the other Technical consequence: (X X)−1 does not exist 27 / 38 6. LINEARLY INDEPENDENT VARIABLES No explanatory variable is a perfect linear function of any other explanatory variable(s). If this condition does not hold, we talk about (multi)collinearity Multicollinearity can be perfect of imperfect Perfect multicollinearity: one explanatory variable is an exact linear function of one or more other explanatory variables In this case, the OLS model is incapable to distinguish one variable from the other Technical consequence: (X X)−1 does not exist OLS estimation cannot be conducted 27 / 38 6. LINEARLY INDEPENDENT VARIABLES No explanatory variable is a perfect linear function of any other explanatory variable(s). If this condition does not hold, we talk about (multi)collinearity Multicollinearity can be perfect of imperfect Perfect multicollinearity: one explanatory variable is an exact linear function of one or more other explanatory variables In this case, the OLS model is incapable to distinguish one variable from the other Technical consequence: (X X)−1 does not exist OLS estimation cannot be conducted Example: we include dummy variables for men and women together with the intercept 27 / 38 6. LINEARLY INDEPENDENT VARIABLES Imperfect multicollinearity: 28 / 38 6. LINEARLY INDEPENDENT VARIABLES Imperfect multicollinearity: There is a linear relationship between the variables, but there is some error in that relationship 28 / 38 6. LINEARLY INDEPENDENT VARIABLES Imperfect multicollinearity: There is a linear relationship between the variables, but there is some error in that relationship Example: we include two variables that proxy for individual health status 28 / 38 6. LINEARLY INDEPENDENT VARIABLES Imperfect multicollinearity: There is a linear relationship between the variables, but there is some error in that relationship Example: we include two variables that proxy for individual health status Consequences of multicollinearity: 28 / 38 6. LINEARLY INDEPENDENT VARIABLES Imperfect multicollinearity: There is a linear relationship between the variables, but there is some error in that relationship Example: we include two variables that proxy for individual health status Consequences of multicollinearity: Estimated coefficients remain unbiased 28 / 38 6. LINEARLY INDEPENDENT VARIABLES Imperfect multicollinearity: There is a linear relationship between the variables, but there is some error in that relationship Example: we include two variables that proxy for individual health status Consequences of multicollinearity: Estimated coefficients remain unbiased But the standard errors of estimates are inflated - making the variable insignificant even though they might be significant 28 / 38 6. LINEARLY INDEPENDENT VARIABLES Imperfect multicollinearity: There is a linear relationship between the variables, but there is some error in that relationship Example: we include two variables that proxy for individual health status Consequences of multicollinearity: Estimated coefficients remain unbiased But the standard errors of estimates are inflated - making the variable insignificant even though they might be significant Solution: drop one of the variables 28 / 38 EXERCISE Which of the following pairs of independent variables would violate the Assumption of no multicollinearity? (That is, which pairs of variables are perfect linear functions of each other?) 29 / 38 EXERCISE Which of the following pairs of independent variables would violate the Assumption of no multicollinearity? (That is, which pairs of variables are perfect linear functions of each other?) right shoe size and left shoe size (of students in the class) 29 / 38 EXERCISE Which of the following pairs of independent variables would violate the Assumption of no multicollinearity? (That is, which pairs of variables are perfect linear functions of each other?) right shoe size and left shoe size (of students in the class) consumption and disposable income (in the United States over the last 30 years) 29 / 38 EXERCISE Which of the following pairs of independent variables would violate the Assumption of no multicollinearity? (That is, which pairs of variables are perfect linear functions of each other?) right shoe size and left shoe size (of students in the class) consumption and disposable income (in the United States over the last 30 years) Xi and 2Xi 29 / 38 EXERCISE Which of the following pairs of independent variables would violate the Assumption of no multicollinearity? (That is, which pairs of variables are perfect linear functions of each other?) right shoe size and left shoe size (of students in the class) consumption and disposable income (in the United States over the last 30 years) Xi and 2Xi Xi and (Xi) 2 29 / 38 7. NORMALITY OF THE ERROR TERM The error term is normally distributed. 30 / 38 7. NORMALITY OF THE ERROR TERM The error term is normally distributed. This assumption is optional, but usually it is invoked 30 / 38 7. NORMALITY OF THE ERROR TERM The error term is normally distributed. This assumption is optional, but usually it is invoked Normality of the error term is inherited by the estimate β 30 / 38 7. NORMALITY OF THE ERROR TERM The error term is normally distributed. This assumption is optional, but usually it is invoked Normality of the error term is inherited by the estimate β Knowing the distribution of the estimate allows us to find its confidence intervals and to test hypotheses about coefficients 30 / 38 PROPERTIES OF THE OLS ESTIMATE 31 / 38 PROPERTIES OF THE OLS ESTIMATE OLS estimate is defined by the formula β = X X −1 X y , where y = Xβ + ε 31 / 38 PROPERTIES OF THE OLS ESTIMATE OLS estimate is defined by the formula β = X X −1 X y , where y = Xβ + ε Hence, it is dependent on the random variable ε and thus β is a random variable itself 31 / 38 PROPERTIES OF THE OLS ESTIMATE OLS estimate is defined by the formula β = X X −1 X y , where y = Xβ + ε Hence, it is dependent on the random variable ε and thus β is a random variable itself The properties of β are based on the properties of ε 31 / 38 GAUSS-MARKOV THEOREM 32 / 38 GAUSS-MARKOV THEOREM Given Classical Assumptions 1. - 6., the OLS estimator of β is the minimum variance estimator from among the set of all linear unbiased estimators of β. 32 / 38 GAUSS-MARKOV THEOREM Given Classical Assumptions 1. - 6., the OLS estimator of β is the minimum variance estimator from among the set of all linear unbiased estimators of β. Assumption 7., normality, is not needed for this theorem The theorem is also known as a stating: “OLS is BLUE”, where BLUE stands for “Best Linear Unbiased Estimator” 32 / 38 GAUSS-MARKOV THEOREM Given Classical Assumptions 1. - 6., the OLS estimator of β is the minimum variance estimator from among the set of all linear unbiased estimators of β. Assumption 7., normality, is not needed for this theorem The theorem is also known as a stating: “OLS is BLUE”, where BLUE stands for “Best Linear Unbiased Estimator” It means that: 32 / 38 GAUSS-MARKOV THEOREM Given Classical Assumptions 1. - 6., the OLS estimator of β is the minimum variance estimator from among the set of all linear unbiased estimators of β. Assumption 7., normality, is not needed for this theorem The theorem is also known as a stating: “OLS is BLUE”, where BLUE stands for “Best Linear Unbiased Estimator” It means that: OLS is linear: β = (X X) −1 X y = Ly , 32 / 38 GAUSS-MARKOV THEOREM Given Classical Assumptions 1. - 6., the OLS estimator of β is the minimum variance estimator from among the set of all linear unbiased estimators of β. Assumption 7., normality, is not needed for this theorem The theorem is also known as a stating: “OLS is BLUE”, where BLUE stands for “Best Linear Unbiased Estimator” It means that: OLS is linear: β = (X X) −1 X y = Ly , OLS is unbiased (see next slide) 32 / 38 GAUSS-MARKOV THEOREM Given Classical Assumptions 1. - 6., the OLS estimator of β is the minimum variance estimator from among the set of all linear unbiased estimators of β. Assumption 7., normality, is not needed for this theorem The theorem is also known as a stating: “OLS is BLUE”, where BLUE stands for “Best Linear Unbiased Estimator” It means that: OLS is linear: β = (X X) −1 X y = Ly , OLS is unbiased (see next slide) OLS has the minimum variance of all unbiased estimators (it is efficient) 32 / 38 EXPECTED VALUE OF THE OLS ESTIMATE 33 / 38 EXPECTED VALUE OF THE OLS ESTIMATE We show: β = X X −1 X y = X X −1 X (Xβ + ε) = = X X −1 X X I β + X X −1 X ε = β + X X −1 X ε 33 / 38 EXPECTED VALUE OF THE OLS ESTIMATE We show: β = X X −1 X y = X X −1 X (Xβ + ε) = = X X −1 X X I β + X X −1 X ε = β + X X −1 X ε E β = E β+ X X −1 X ε = E [β] + E X X −1 X ε = = β + X X −1 X E [ε] 0 = β 33 / 38 EXPECTED VALUE OF THE OLS ESTIMATE We show: β = X X −1 X y = X X −1 X (Xβ + ε) = = X X −1 X X I β + X X −1 X ε = β + X X −1 X ε E β = E β+ X X −1 X ε = E [β] + E X X −1 X ε = = β + X X −1 X E [ε] 0 = β Since E β = β, OLS is unbiased 33 / 38 VARIANCE OF THE OLS ESTIMATE 34 / 38 VARIANCE OF THE OLS ESTIMATE We show: β = X X −1 X y = β + X X −1 X ε 34 / 38 VARIANCE OF THE OLS ESTIMATE We show: β = X X −1 X y = β + X X −1 X ε Var β = Var β + X X −1 X ε = = Var(β) + Var (X X −1 X ε] = = X X −1 X · Var [ε] · (X X −1 X ] = = X X −1 X · Var [ε] σ2I · X X X −1 = = σ2 X X −1 X X X X −1 = σ2 X X −1 34 / 38 NORMALITY OF THE OLS ESTIMATE 35 / 38 NORMALITY OF THE OLS ESTIMATE When we assume that εi ∼ N(0, σ2), we can see that β = X X −1 X y = β + X X −1 X ε is also normally distributed (it is a linear combination of normally distributed variables) 35 / 38 NORMALITY OF THE OLS ESTIMATE When we assume that εi ∼ N(0, σ2), we can see that β = X X −1 X y = β + X X −1 X ε is also normally distributed (it is a linear combination of normally distributed variables) Hence, we say that β is jointly normal: β ∼ N β, σ2 X X −1 35 / 38 NORMALITY OF THE OLS ESTIMATE When we assume that εi ∼ N(0, σ2), we can see that β = X X −1 X y = β + X X −1 X ε is also normally distributed (it is a linear combination of normally distributed variables) Hence, we say that β is jointly normal: β ∼ N β, σ2 X X −1 This will help us to test hypotheses about regression coefficients (see next lecture) 35 / 38 NORMALITY OF THE OLS ESTIMATE When we assume that εi ∼ N(0, σ2), we can see that β = X X −1 X y = β + X X −1 X ε is also normally distributed (it is a linear combination of normally distributed variables) Hence, we say that β is jointly normal: β ∼ N β, σ2 X X −1 This will help us to test hypotheses about regression coefficients (see next lecture) Note that the normality of errors is not required for large samples, be-cause β is asymptotically normal anyway 35 / 38 CONSISTENCY OF THE OLS ESTIMATE 36 / 38 CONSISTENCY OF THE OLS ESTIMATE When no explanatory variables are correlated with the error term (Assumption 5.), OLS estimate is consistent: 36 / 38 CONSISTENCY OF THE OLS ESTIMATE When no explanatory variables are correlated with the error term (Assumption 5.), OLS estimate is consistent: E X ε = 0 ⇒ β n→∞ −→ β 36 / 38 CONSISTENCY OF THE OLS ESTIMATE When no explanatory variables are correlated with the error term (Assumption 5.), OLS estimate is consistent: E X ε = 0 ⇒ β n→∞ −→ β In other words: as the number of observations increases, the estimate converges to the true value of the coefficient 36 / 38 CONSISTENCY OF THE OLS ESTIMATE When no explanatory variables are correlated with the error term (Assumption 5.), OLS estimate is consistent: E X ε = 0 ⇒ β n→∞ −→ β In other words: as the number of observations increases, the estimate converges to the true value of the coefficient Consistency is the most important property of any estimate!!! 36 / 38 CONSISTENCY OF THE OLS ESTIMATE As long as the OLS estimate of β is consistent, the residuals are consistent estimates of the error term 37 / 38 CONSISTENCY OF THE OLS ESTIMATE As long as the OLS estimate of β is consistent, the residuals are consistent estimates of the error term If we have consistent estimates of the error term, we can test if it satisfies the classical assumptions 37 / 38 CONSISTENCY OF THE OLS ESTIMATE As long as the OLS estimate of β is consistent, the residuals are consistent estimates of the error term If we have consistent estimates of the error term, we can test if it satisfies the classical assumptions Moreover, possible deviations from the classical model can be corrected 37 / 38 CONSISTENCY OF THE OLS ESTIMATE As long as the OLS estimate of β is consistent, the residuals are consistent estimates of the error term If we have consistent estimates of the error term, we can test if it satisfies the classical assumptions Moreover, possible deviations from the classical model can be corrected As a consequence, the assumption of zero correlation between explanatory variables and the error term E X ε = 0 is the most important one to satisfy in regression models 37 / 38 SUMMARY We expressed the multivariate OLS model in matrix notation y = Xβ + ε and we found the formula of the estimate: β = X X −1 X y 38 / 38 SUMMARY We expressed the multivariate OLS model in matrix notation y = Xβ + ε and we found the formula of the estimate: β = X X −1 X y We listed the classical assumptions of regression models: 38 / 38 SUMMARY We expressed the multivariate OLS model in matrix notation y = Xβ + ε and we found the formula of the estimate: β = X X −1 X y We listed the classical assumptions of regression models: model linear in parameters, explanatory variables linearly independent 38 / 38 SUMMARY We expressed the multivariate OLS model in matrix notation y = Xβ + ε and we found the formula of the estimate: β = X X −1 X y We listed the classical assumptions of regression models: model linear in parameters, explanatory variables linearly independent (normally distributed) error term with zero mean and constant variance, no serial autocorrelation 38 / 38 SUMMARY We expressed the multivariate OLS model in matrix notation y = Xβ + ε and we found the formula of the estimate: β = X X −1 X y We listed the classical assumptions of regression models: model linear in parameters, explanatory variables linearly independent (normally distributed) error term with zero mean and constant variance, no serial autocorrelation no correlation between error term and explanatory variables 38 / 38 SUMMARY We expressed the multivariate OLS model in matrix notation y = Xβ + ε and we found the formula of the estimate: β = X X −1 X y We listed the classical assumptions of regression models: model linear in parameters, explanatory variables linearly independent (normally distributed) error term with zero mean and constant variance, no serial autocorrelation no correlation between error term and explanatory variables We showed that if these assumptions hold, OLS estimate is 38 / 38 SUMMARY We expressed the multivariate OLS model in matrix notation y = Xβ + ε and we found the formula of the estimate: β = X X −1 X y We listed the classical assumptions of regression models: model linear in parameters, explanatory variables linearly independent (normally distributed) error term with zero mean and constant variance, no serial autocorrelation no correlation between error term and explanatory variables We showed that if these assumptions hold, OLS estimate is consistent (if no correlation between X and ε) 38 / 38 SUMMARY We expressed the multivariate OLS model in matrix notation y = Xβ + ε and we found the formula of the estimate: β = X X −1 X y We listed the classical assumptions of regression models: model linear in parameters, explanatory variables linearly independent (normally distributed) error term with zero mean and constant variance, no serial autocorrelation no correlation between error term and explanatory variables We showed that if these assumptions hold, OLS estimate is consistent (if no correlation between X and ε) unbiased (if no correlation between X and ε) 38 / 38 SUMMARY We expressed the multivariate OLS model in matrix notation y = Xβ + ε and we found the formula of the estimate: β = X X −1 X y We listed the classical assumptions of regression models: model linear in parameters, explanatory variables linearly independent (normally distributed) error term with zero mean and constant variance, no serial autocorrelation no correlation between error term and explanatory variables We showed that if these assumptions hold, OLS estimate is consistent (if no correlation between X and ε) unbiased (if no correlation between X and ε) efficient (if homoskedasticity and no autocorrelation of ε) 38 / 38 SUMMARY We expressed the multivariate OLS model in matrix notation y = Xβ + ε and we found the formula of the estimate: β = X X −1 X y We listed the classical assumptions of regression models: model linear in parameters, explanatory variables linearly independent (normally distributed) error term with zero mean and constant variance, no serial autocorrelation no correlation between error term and explanatory variables We showed that if these assumptions hold, OLS estimate is consistent (if no correlation between X and ε) unbiased (if no correlation between X and ε) efficient (if homoskedasticity and no autocorrelation of ε) normally distributed (if ε normally distributed) 38 / 38