LECTURE 3 1 / 36 Introduction to Econometrics INTRODUCTION TO LINEAR REGRESSION ANALYSIS II Dali Laxton October 7, 2022 REVISION: THE PREVIOUS LECTURE 2 / 36 e (Desired) properties of an estimator: §An estimator is unbiased if the mean of its distribution is equal to the value of the parameter it is estimating § §An estimator is consistent if it converges to the value of the true parameter as the sample size increases § §An estimator is efficient if the variance of its sampling distribution is the smallest possible REVISION: THE PREVIOUS LECTURE e We explained the principle of OLS estimator: minimizing the sum of squared differences between the observation and the regression line yi = β0 + β1xi + εi e We found the formulae for the estimates: 3 / 36 REVISION: THE PREVIOUS LECTURE 4 / 36 e We explained that the stochastic error term must be present in a regression equation because of: 1.omission of many minor influences (unavailable data) 2.measurement error 3.possibly incorrect functional form 4.stochastic character of unpredictable human behavior e Remember that all of these factors are included in the error term and may alter its properties e The properties of the error term determine the properties of the estimates WARM-UP EXERCISE 5 / 36 e You receive a unique dataset that includes wages of all citizens of Brno as well as their experience (number of years spent working). Obviously, you are very curious about what is the effect of experience on wages. e You run an OLS regression of monthly wage in CZK on the number of years of experience and obtain the following results: 1.Interpret the meaning of the coefficient of experi. 2.Use the estimates to determine the average wage of a person with 1, 5, 20, and 40 years of experience. 3.Do the predicted wages seem realistic? Explain your answer. ON TODAY’S LECTURE 6 / 36 e We will derive estimation formulas for multivariate OLS e We will list the assumptions about the error term and the explanatory variables that are required in classical regression models e We will show that under these assumptions, OLS is the best estimator available for regression models e The rest of the course will mostly deal in one way or another with the question what to do when one of the classical assumptions is not met e Readings: Studenmund - chapter 4 Wooldridge - chapters 5, 8, 9, 12 ORDINARY LEAST SQUARES WITH SEVERAL EXPLANATORY VARIABLES 7 / 36 e Usually, there are more than one explanatory variables in regression models e Multivariate model with k explanatory variables: yi = β0 + β1xi1 + β2xi2 + . . . + βkxik + εi e For observations 1, 2, . . . , n, we have: y1 = β0 + β1x11 + β2x12 + . . . + βkx1k + ε1 y2 = β0 + β1x21 + β2x22 + . . . + βkx2k + ε2 . . . . yn = β0 + β1xn1 + β2xn2 + . . . + βkxnk + εn MATRIX NOTATION 8 / 36 e We can write in matrix form: or in a simplified notation: Y = Xβ + ε k k OLS - DERIVATION UNDER MATRIX NOTATION(OPTIONAL) MEANING OF REGRESSION COEFFICIENT 10 / 36 e Consider the multivariate model Q = β0 + β1P + β2Ps + β3Y + ε ^ s estimated as Q = 31.50 − 0.73P + 0.11P + 0.23Y Q . . . quantity demanded P . . . commodity’s price Ps . . . price of substitute Y . . . disposable income e Meaning of β1 is the impact of a one unit increase in P on the dependent variable Q, holding constant the other included independent variables Ps and Y e When price increases by 1 unit (and price of a substitute good and income remain the same), quantity demanded decreases by 0.73 units EXERCISE 11 / 36 e Remember the unique dataset that includes wages of all citizens of Brno as well as their experience (number of years spent working). e Because you realize that wages may not be linearly dependent on experience, you add an additional variable exper2i into your model and you obtain the following results: wagei = 14450 + 1160 · experi − 25 · exper2i 1.What is the overall impact of increasing the number of years of experience by 1 year? 2.Use the estimates to determine the average wage of a person with 1, 5, 20, and 40 years of experience. 3.Do the predicted wages seem realistic now? Explain your answer. ^ THE CLASSICAL ASSUMPTIONS 12 / 36 1.Linearity: the regression model is linear in the parameters (coefficients) 2.Random sampling: the data is a random sample drawn from the population and each data point follows the population equation 3.No perfect collinearity: the values of explanatory variables are not all the same and no explanatory variable is a perfect linear function of any other explanatory variable(s) 4.Zero conditional mean: values of explanatory variables must contain no information about the mean of the unobserved factors - explanatory variables are uncorrelated with the error term 5.Homoskedasticity: the error term has a constant variance 6.Normality of the error term: the error term is normally distributed 1. LINEARITY IN PARAMETERS 13 / 36 The regression model is linear in coefficients. e Linearity in variables is not required e Example: production function Y = AKβ1 Lβ2 for which we suppose A = expβ0+ε can be transformed so that ln Y = β0 + β1 ln K + β2 ln L + ε and the linearity in coefficients is restored e Note that it is the linearity in coefficients that allows us to rewrite the general regression model in matrix form EXERCISE Which of the following models is/are linear? 14 / 36 EXERCISE Which of the following models is/are linear? 15 / 36 2. RANDOM SAMPLING 16 / 36 The data is a random sample drawn from the population and each data point follows the population equation. e Discussion during last class 3. NO PERFECT COLLINEARITY 17 / 36 The values of explanatory variables are not all the same and no explanatory variable is a perfect linear function of any other explanatory variable(s). e If this condition does not hold, we talk about (multi)collinearity e Multicollinearity can be perfect or imperfect e Perfect multicollinearity: one explanatory variable is an exact linear function of one or more other explanatory variables §In this case, the OLS model is incapable to distinguish one variable from the other §OLS estimation cannot be conducted §Example: we include dummy variables for men and women together with the intercept Mathematically, collinearity causes the determinant of the matrix to be singular (0), therefore, you cannot do inversion beta=x’x^-1x’y 3. NO PERFECT COLLINEARITY 18 / 36 e Imperfect multicollinearity: There is a linear relationship between the variables, but there is some error in that relationship Example: we include two variables that proxy for individual health status e Consequences of multicollinearity: Estimated coefficients remain unbiased But the standard errors of estimates are inflated - making the variable insignificant even though they might be significant e Solution: drop one of the variables EXERCISE 19 / 36 e Which of the following pairs of independent variables would violate the Assumption of no multicollinearity? (That is, which pairs of variables are perfect linear functions of each other?) §right shoe size and left shoe size (of students in the class) §consumption and disposable income (in the United States over the last 30 years) §Xi and 2Xi §Xi and (Xi)2 4. BEFORE ZERO CONDITIONAL MEAN 20 / 36 The error term has a zero population mean. e Notation: E[εi] = 0 or E[ε] = 0 e Idea: observations are distributed around the regression line, the average of deviations is zero e On average, we make no ”mistakes” e This assumption is satisfied as long as there is an intercept included in the equation Intercept matters because y=beta0+beta1*x+epsilon; E(Y)= beta0+beta1*E(X)+ E(epsilon). And Ey=y-bar, Ex=x-bar => we know that in optimization y-bar=beta0+beta1*x-bar => E(epsilon)=0 4. ZERO CONDITIONAL MEAN 21 / 36 All explanatory variables are uncorrelated with the error term. e Notation: E[xiεi] = 0 or E[Xjε] = 0 e If an explanatory variable and the error term were correlated with each other, the OLS estimates would be likely to attribute some of the variation in y to the x when it actually came from the error term e Example: Impact of skipping classes on exam scores: Motivated students are less likely to skip classes → negative correlation between skipped and error term e Leads to biased and inconsistent estimates e We will solve this problem using IV approach 5. HOMOSKEDASTICITY 22 / 36 The error term has a constant variance - Var(si|Xi) = σ2 e If it is not satisfied, we talk about heteroskedasticity e It states that each observation of the error is drawn from a distribution with the same variance and thus varies in the same manner around the regression line e If the error term is heteroskedastic, it is more difficult for OLS to get precise estimates of the coefficients of the explanatory variables e Technically: the OLS estimate will be consistent, but not efficient 24 / 36 5. HOMOSKEDASTICITY - GRAPHICAL REPRESENTATION x Y GRAPHICAL REPRESENTATION 25 / 36 x Y 5. HOMOSKEDASTICITY 23 / 36 e Heteroskedasticity is often present in cross-sectional data e Example: Analysis of household consumption patterns Variance of the consumption of certain goods might be greater for higher-income households These have more discretionary income than do lower-income households e We will solve this problem using Hull-White robust standard errors GRAPHICAL REPRESENTATION 6. NORMALITY OF THE ERROR TERM 26 / 36 The error term is normally distributed. e This is an empirical question e ^ Normality of the error term is inherited by the estimate β e Knowing the distribution of the estimate allows us to find its confidence intervals and to test hypotheses about coefficients PROPERTIES OF THE OLS ESTIMATE 27 / 36 e OLS estimate is defined by the formula where y = Xβ + ε e Hence, it is dependent on the random variable ε and thus is a random variable itself e The properties of are based on the properties of ε EXPECTED VALUE OF THE OLS ESTIMATOR 28 / 36 e Under the assumptions 1-4, OLS is unbiased: e The estimated coefficients may be smaller or larger, depending on the sample e However, on average, they will be equal to the true parameters e NOTE: in a given sample, estimates may differ considerably from true values VARIANCE OF THE OLS ESTIMATOR 28 / 36 §Under the assumptions 1-5 , OLS is efficient : § § § § §The error variance (σ2): increases the variance of an estimator §The variation in explanatory variable reduces the variance of the estimator § GAUSS-MARKOV THEOREM 30 / 36 Under the assumptions 1 - 5, the OLS estimator of β is the best linear unbiased estimator (BLUE) of the regression coefficients e NOTE: assumption 6, normality, is not needed for this theorem e Gauss-Markov Theorem means that: EXPECTED VALUE OF THE OLS ESTIMATE (OPTIONAL) 31 / 36 VARIANCE OF THE OLS ESTIMATE (OPTIONAL) 32 / 36 NORMALITY OF THE OLS ESTIMATE 33 / 36 CONSISTENCY OF THE OLS ESTIMATE 34 / 36 e When no explanatory variables are correlated with the error term (Assumption 4), OLS estimate is consistent: e In other words: as the number of observations increases, the estimate converges to the true value of the coefficient CONSISTENCY OF THE OLS ESTIMATE 35 / 36 e ^ As long as the OLS estimate of β is consistent, the residuals are consistent estimates of the error term e If we have consistent estimates of the error term, we can test if it satisfies the classical assumptions e Moreover, possible deviations from the classical model can be corrected e As a consequence, the assumption of zero correlation between explanatory variables and the error term is the most important one to satisfy in regression models SUMMARY 36 / 36 e We expressed the multivariate OLS model in matrix notation y = Xβ + ε and we found the formula of the estimate: e We listed the classical assumptions of regression models: §model linear in parameters, random sampling, explanatory variables linearly independent §(normally distributed) error term with zero mean and constant variance §no correlation between error term and explanatory variables e We showed that if these assumptions hold, OLS estimate is §consistent (if no correlation between X and ε) §unbiased (if no correlation between X and ε) §efficient (if homoskedasticity and no autocorrelation of ε) §normally distributed (if ε normally distributed)