LECTURE 2 Introduction to Econometrics INTRODUCTION TO LINEAR REGRESSION ANALYSIS I. October 6, 2017 1 / 33 PREVIOUS LECTURE... Introduction, organization, review of statistical background 2 / 33 PREVIOUS LECTURE... Introduction, organization, review of statistical background random variables 2 / 33 PREVIOUS LECTURE... Introduction, organization, review of statistical background random variables mean, variance, standard deviation 2 / 33 PREVIOUS LECTURE... Introduction, organization, review of statistical background random variables mean, variance, standard deviation covariance, correlation, independence 2 / 33 PREVIOUS LECTURE... Introduction, organization, review of statistical background random variables mean, variance, standard deviation covariance, correlation, independence normal distribution 2 / 33 PREVIOUS LECTURE... Introduction, organization, review of statistical background random variables mean, variance, standard deviation covariance, correlation, independence normal distribution standardized random variables 2 / 33 PREVIOUS LECTURE... Introduction, organization, review of statistical background random variables mean, variance, standard deviation covariance, correlation, independence normal distribution standardized random variables 2 / 33 PREVIOUS LECTURE... Introduction, organization, review of statistical background random variables mean, variance, standard deviation covariance, correlation, independence normal distribution standardized random variables 2 / 33 WARM-UP EXERCISE 3 / 33 WARM-UP EXERCISE What is the correlation between X and Y?         X Y 5 10 3 6 −1 −4 6 8 2 5         3 / 33 WARM-UP EXERCISE What is the correlation between X and Y?         X Y 5 10 3 6 −1 −4 6 8 2 5         Correlation: Corr(X, Y) = Cov(X,Y) σXσY 3 / 33 WARM-UP EXERCISE What is the correlation between X and Y?         X Y 5 10 3 6 −1 −4 6 8 2 5         Correlation: Corr(X, Y) = Cov(X,Y) σXσY Covariance: Cov(X, Y) = E [(X − E[X]) (Y − E[Y])] = E [XY] − E[X]E[Y] 3 / 33 WARM-UP EXERCISE What is the correlation between X and Y?         X Y 5 10 3 6 −1 −4 6 8 2 5         Correlation: Corr(X, Y) = Cov(X,Y) σXσY Covariance: Cov(X, Y) = E [(X − E[X]) (Y − E[Y])] = E [XY] − E[X]E[Y] Standard deviation : σX = Var[X] 3 / 33 WARM-UP EXERCISE What is the correlation between X and Y?         X Y 5 10 3 6 −1 −4 6 8 2 5         Correlation: Corr(X, Y) = Cov(X,Y) σXσY Covariance: Cov(X, Y) = E [(X − E[X]) (Y − E[Y])] = E [XY] − E[X]E[Y] Standard deviation : σX = Var[X] Variance: Var[X] = E (X − E [X])2 = E[X2] − (E[X])2 3 / 33 LECTURE 2. Introduction to simple linear regression analysis 4 / 33 LECTURE 2. Introduction to simple linear regression analysis Sampling and estimation 4 / 33 LECTURE 2. Introduction to simple linear regression analysis Sampling and estimation OLS principle 4 / 33 LECTURE 2. Introduction to simple linear regression analysis Sampling and estimation OLS principle Readings: Studenmund, A. H., Using Econometrics: A Practical Guide, Chapters 1, 2.1, 17.2, 17.3 Wooldridge, J. M., Introductory Econometrics: A Modern Approach, Chapters 2.1, 2.2 4 / 33 SAMPLING Population: the entire group of items that interests us 5 / 33 SAMPLING Population: the entire group of items that interests us Sample: the part of the population that we actually observe 5 / 33 SAMPLING Population: the entire group of items that interests us Sample: the part of the population that we actually observe Statistical inference: use of the sample to draw conclusion about the characteristics of the population from which the sample came 5 / 33 SAMPLING Population: the entire group of items that interests us Sample: the part of the population that we actually observe Statistical inference: use of the sample to draw conclusion about the characteristics of the population from which the sample came Examples: medical experiments, opinion polls 5 / 33 RANDOM SAMPLING VS SELECTION BIAS 6 / 33 RANDOM SAMPLING VS SELECTION BIAS Correct statistical inference can be performed only on a random sample - a sample that reflects the true distribution of the population 6 / 33 RANDOM SAMPLING VS SELECTION BIAS Correct statistical inference can be performed only on a random sample - a sample that reflects the true distribution of the population Biased sample: any sample that differs systematically from the population that it is intended to represent 6 / 33 RANDOM SAMPLING VS SELECTION BIAS Correct statistical inference can be performed only on a random sample - a sample that reflects the true distribution of the population Biased sample: any sample that differs systematically from the population that it is intended to represent Selection bias: occurs when the selection of the sample systematically excludes or under represents certain groups 6 / 33 RANDOM SAMPLING VS SELECTION BIAS Correct statistical inference can be performed only on a random sample - a sample that reflects the true distribution of the population Biased sample: any sample that differs systematically from the population that it is intended to represent Selection bias: occurs when the selection of the sample systematically excludes or under represents certain groups Example: opinion poll about tuition payments among undergraduate students vs all citizens 6 / 33 RANDOM SAMPLING VS SELECTION BIAS Correct statistical inference can be performed only on a random sample - a sample that reflects the true distribution of the population Biased sample: any sample that differs systematically from the population that it is intended to represent Selection bias: occurs when the selection of the sample systematically excludes or under represents certain groups Example: opinion poll about tuition payments among undergraduate students vs all citizens Self-selection bias: occurs when we examine data for a group of people who have chosen to be in that group 6 / 33 RANDOM SAMPLING VS SELECTION BIAS Correct statistical inference can be performed only on a random sample - a sample that reflects the true distribution of the population Biased sample: any sample that differs systematically from the population that it is intended to represent Selection bias: occurs when the selection of the sample systematically excludes or under represents certain groups Example: opinion poll about tuition payments among undergraduate students vs all citizens Self-selection bias: occurs when we examine data for a group of people who have chosen to be in that group Example: accident records of people who buy collision insurance 6 / 33 EXERCISE 1 American Express and the French tourist office sponsored a survey that found that most visitors to France do not consider the French to be especially unfriendly. 7 / 33 EXERCISE 1 American Express and the French tourist office sponsored a survey that found that most visitors to France do not consider the French to be especially unfriendly. The sample consisted of 1,000 Americans who have visited France more than once for pleasure over the past two years. 7 / 33 EXERCISE 1 American Express and the French tourist office sponsored a survey that found that most visitors to France do not consider the French to be especially unfriendly. The sample consisted of 1,000 Americans who have visited France more than once for pleasure over the past two years. Is this survey unbiased? 7 / 33 ESTIMATION 8 / 33 ESTIMATION Parameter: a true characteristic of the distribution of a variable, whose value is unknown, but can be estimated 8 / 33 ESTIMATION Parameter: a true characteristic of the distribution of a variable, whose value is unknown, but can be estimated Example: population mean E[X] 8 / 33 ESTIMATION Parameter: a true characteristic of the distribution of a variable, whose value is unknown, but can be estimated Example: population mean E[X] Estimator: a sample statistic that is used to estimate the value of the parameter 8 / 33 ESTIMATION Parameter: a true characteristic of the distribution of a variable, whose value is unknown, but can be estimated Example: population mean E[X] Estimator: a sample statistic that is used to estimate the value of the parameter Example: sample mean Xn 8 / 33 ESTIMATION Parameter: a true characteristic of the distribution of a variable, whose value is unknown, but can be estimated Example: population mean E[X] Estimator: a sample statistic that is used to estimate the value of the parameter Example: sample mean Xn Note that the estimator is a random variable (it has a probability distribution, mean, variance,...) 8 / 33 ESTIMATION Parameter: a true characteristic of the distribution of a variable, whose value is unknown, but can be estimated Example: population mean E[X] Estimator: a sample statistic that is used to estimate the value of the parameter Example: sample mean Xn Note that the estimator is a random variable (it has a probability distribution, mean, variance,...) Estimate: the specific value of the estimator that is obtained on a specific sample 8 / 33 PROPERTIES OF AN ESTIMATOR 9 / 33 PROPERTIES OF AN ESTIMATOR An estimator is unbiased if the mean of its distribution is equal to the value of the parameter it is estimating 9 / 33 PROPERTIES OF AN ESTIMATOR An estimator is unbiased if the mean of its distribution is equal to the value of the parameter it is estimating An estimator is consistent if it converges to the value of the true parameter as the sample size increases 9 / 33 PROPERTIES OF AN ESTIMATOR An estimator is unbiased if the mean of its distribution is equal to the value of the parameter it is estimating An estimator is consistent if it converges to the value of the true parameter as the sample size increases An estimator is efficient if the variance of its sampling distribution is the smallest possible 9 / 33 EXERCISE 2 A young econometrician wants to estimate the relationship between foreign direct investments (FDI) in her country and firm profitability. 10 / 33 EXERCISE 2 A young econometrician wants to estimate the relationship between foreign direct investments (FDI) in her country and firm profitability. Her reasoning is that better managerial skills introduced by foreign owners increase firms’ profitability. 10 / 33 EXERCISE 2 A young econometrician wants to estimate the relationship between foreign direct investments (FDI) in her country and firm profitability. Her reasoning is that better managerial skills introduced by foreign owners increase firms’ profitability. She collects a random sample of 8,750 firms and finds that one sixth of the firms were entered within last few years by foreign investors. The rest of the firms are owned domestically. 10 / 33 EXERCISE 2 A young econometrician wants to estimate the relationship between foreign direct investments (FDI) in her country and firm profitability. Her reasoning is that better managerial skills introduced by foreign owners increase firms’ profitability. She collects a random sample of 8,750 firms and finds that one sixth of the firms were entered within last few years by foreign investors. The rest of the firms are owned domestically. When she compares indicators of profitability, such as ROA and ROE, between the domestic and foreign-owned firms, she finds significantly better outcomes for foreign-owned firms. 10 / 33 EXERCISE 2 A young econometrician wants to estimate the relationship between foreign direct investments (FDI) in her country and firm profitability. Her reasoning is that better managerial skills introduced by foreign owners increase firms’ profitability. She collects a random sample of 8,750 firms and finds that one sixth of the firms were entered within last few years by foreign investors. The rest of the firms are owned domestically. When she compares indicators of profitability, such as ROA and ROE, between the domestic and foreign-owned firms, she finds significantly better outcomes for foreign-owned firms. She concludes that FDI increase firms’ profitability. Is this conclusion correct? 10 / 33 ECONOMETRIC MODELS 11 / 33 ECONOMETRIC MODELS Econometric model is an estimable formulation of a theoretical relationship 11 / 33 ECONOMETRIC MODELS Econometric model is an estimable formulation of a theoretical relationship Theory says: Q = f(P, Ps, Y) Q . . . quantity demanded P . . . commodity’s price Ps . . . price of substitute good Y . . . disposable income 11 / 33 ECONOMETRIC MODELS Econometric model is an estimable formulation of a theoretical relationship Theory says: Q = f(P, Ps, Y) Q . . . quantity demanded P . . . commodity’s price Ps . . . price of substitute good Y . . . disposable income We simplify: Q = β0 + β1P + β2Ps + β3Y 11 / 33 ECONOMETRIC MODELS Econometric model is an estimable formulation of a theoretical relationship Theory says: Q = f(P, Ps, Y) Q . . . quantity demanded P . . . commodity’s price Ps . . . price of substitute good Y . . . disposable income We simplify: Q = β0 + β1P + β2Ps + β3Y We estimate: Q = 31.50 − 0.73P + 0.11Ps + 0.23Y 11 / 33 ECONOMETRIC MODELS Today’s econometrics deals with different, even very general models 12 / 33 ECONOMETRIC MODELS Today’s econometrics deals with different, even very general models During the course we will cover just linear regression models 12 / 33 ECONOMETRIC MODELS Today’s econometrics deals with different, even very general models During the course we will cover just linear regression models We will see how these models are estimated by 12 / 33 ECONOMETRIC MODELS Today’s econometrics deals with different, even very general models During the course we will cover just linear regression models We will see how these models are estimated by Ordinary Least Squares (OLS) 12 / 33 ECONOMETRIC MODELS Today’s econometrics deals with different, even very general models During the course we will cover just linear regression models We will see how these models are estimated by Ordinary Least Squares (OLS) Generalized Least Squares (GLS) 12 / 33 ECONOMETRIC MODELS Today’s econometrics deals with different, even very general models During the course we will cover just linear regression models We will see how these models are estimated by Ordinary Least Squares (OLS) Generalized Least Squares (GLS) We will perform estimation on different types of data 12 / 33 DATA USED IN ECONOMETRICS 13 / 33 DATA USED IN ECONOMETRICS cross-section repeated cross-section sample of units several independent (eg. firms, individuals) samples of units taken at a given point in time (eg. firms, individuals) taken at different points in time time-series panel data observations of variable(s) time series for each in different points in time cross-sectional unit in the data set 13 / 33 DATA USED IN ECONOMETRICS - EXAMPLES 14 / 33 DATA USED IN ECONOMETRICS - EXAMPLES Country’s macroeconomic indicators (GDP, inflation rate, net exports, etc.) month by month 14 / 33 DATA USED IN ECONOMETRICS - EXAMPLES Country’s macroeconomic indicators (GDP, inflation rate, net exports, etc.) month by month Data about firms’ employees or financial indicators as of the end of the year 14 / 33 DATA USED IN ECONOMETRICS - EXAMPLES Country’s macroeconomic indicators (GDP, inflation rate, net exports, etc.) month by month Data about firms’ employees or financial indicators as of the end of the year Records of bank clients who were given a loan 14 / 33 DATA USED IN ECONOMETRICS - EXAMPLES Country’s macroeconomic indicators (GDP, inflation rate, net exports, etc.) month by month Data about firms’ employees or financial indicators as of the end of the year Records of bank clients who were given a loan Annual social security or tax records of individual workers 14 / 33 STEPS OF AN ECONOMETRIC ANALYSIS 15 / 33 STEPS OF AN ECONOMETRIC ANALYSIS 1. Formulation of an economic model (rigorous or intuitive) 15 / 33 STEPS OF AN ECONOMETRIC ANALYSIS 1. Formulation of an economic model (rigorous or intuitive) 2. Formulation of an econometric model based on the economic model 15 / 33 STEPS OF AN ECONOMETRIC ANALYSIS 1. Formulation of an economic model (rigorous or intuitive) 2. Formulation of an econometric model based on the economic model 3. Collection of data 15 / 33 STEPS OF AN ECONOMETRIC ANALYSIS 1. Formulation of an economic model (rigorous or intuitive) 2. Formulation of an econometric model based on the economic model 3. Collection of data 4. Estimation of the econometric model 15 / 33 STEPS OF AN ECONOMETRIC ANALYSIS 1. Formulation of an economic model (rigorous or intuitive) 2. Formulation of an econometric model based on the economic model 3. Collection of data 4. Estimation of the econometric model 5. Interpretation of results 15 / 33 EXAMPLE - ECONOMIC MODEL 16 / 33 EXAMPLE - ECONOMIC MODEL Denote: p . . . price of the good c . . . firm’s average cost per one unit of output q(p) . . . demand for firm’s output 16 / 33 EXAMPLE - ECONOMIC MODEL Denote: p . . . price of the good c . . . firm’s average cost per one unit of output q(p) . . . demand for firm’s output Firm profit: π = q(p) · (p − c) 16 / 33 EXAMPLE - ECONOMIC MODEL Denote: p . . . price of the good c . . . firm’s average cost per one unit of output q(p) . . . demand for firm’s output Firm profit: π = q(p) · (p − c) Demand for good: q(p) = a − b · p 16 / 33 EXAMPLE - ECONOMIC MODEL Denote: p . . . price of the good c . . . firm’s average cost per one unit of output q(p) . . . demand for firm’s output Firm profit: π = q(p) · (p − c) Demand for good: q(p) = a − b · p Derive: q = a 2 − b 2 · c 16 / 33 EXAMPLE - ECONOMIC MODEL Denote: p . . . price of the good c . . . firm’s average cost per one unit of output q(p) . . . demand for firm’s output Firm profit: π = q(p) · (p − c) Demand for good: q(p) = a − b · p Derive: q = a 2 − b 2 · c We call q dependent variable and c explanatory variable 16 / 33 EXAMPLE - ECONOMETRIC MODEL Write the relationship in a simple linear form q = β0 + β1c 17 / 33 EXAMPLE - ECONOMETRIC MODEL Write the relationship in a simple linear form q = β0 + β1c (have in mind that β0 = a 2 and β1 = −b 2 ) 17 / 33 EXAMPLE - ECONOMETRIC MODEL Write the relationship in a simple linear form q = β0 + β1c (have in mind that β0 = a 2 and β1 = −b 2 ) There are other (unpredictable) things that influence firms’ sales ⇒ add disturbance term q = β0 + β1c + ε 17 / 33 EXAMPLE - ECONOMETRIC MODEL Write the relationship in a simple linear form q = β0 + β1c (have in mind that β0 = a 2 and β1 = −b 2 ) There are other (unpredictable) things that influence firms’ sales ⇒ add disturbance term q = β0 + β1c + ε Find the value of parameters β1 (slope) and β0 (intercept) 17 / 33 EXAMPLE - DATA Ideally: investigate all firms in the economy 18 / 33 EXAMPLE - DATA Ideally: investigate all firms in the economy Really: investigate a sample of firms We need a random (unbiased) sample of firms 18 / 33 EXAMPLE - DATA Ideally: investigate all firms in the economy Really: investigate a sample of firms We need a random (unbiased) sample of firms Collect data: Firm 1 2 3 4 5 6 q 15 32 52 14 37 27 c 294 247 153 350 173 218 18 / 33 EXAMPLE - DATA 10 10 1020 20 2030 30 3040 40 4050 50 50Output Output Output150 150 150200 200 200250 250 250300 300 300350 350 350Average cost Average cost Average cost 19 / 33 EXAMPLE - ESTIMATION 10 10 1020 20 2030 30 3040 40 4050 50 50Output Output Output150 150 150200 200 200250 250 250300 300 300350 350 350Average cost Average cost Average cost 20 / 33 EXAMPLE - ESTIMATION 10 10 1020 20 2030 30 3040 40 4050 50 50Output Output Output150 150 150200 200 200250 250 250300 300 300350 350 350Average cost Average cost Average cost OLS method: 21 / 33 EXAMPLE - ESTIMATION 10 10 1020 20 2030 30 3040 40 4050 50 50Output Output Output150 150 150200 200 200250 250 250300 300 300350 350 350Average cost Average cost Average cost OLS method: Make the fit as good as possible 21 / 33 EXAMPLE - ESTIMATION 10 10 1020 20 2030 30 3040 40 4050 50 50Output Output Output150 150 150200 200 200250 250 250300 300 300350 350 350Average cost Average cost Average cost OLS method: Make the fit as good as possible ⇓ Make the misfit as low as possible 21 / 33 EXAMPLE - ESTIMATION 10 10 1020 20 2030 30 3040 40 4050 50 50Output Output Output150 150 150200 200 200250 250 250300 300 300350 350 350Average cost Average cost Average cost OLS method: Make the fit as good as possible ⇓ Make the misfit as low as possible ⇓ Minimize the (vertical) distance between data points and regression line 21 / 33 EXAMPLE - ESTIMATION 10 10 1020 20 2030 30 3040 40 4050 50 50Output Output Output150 150 150200 200 200250 250 250300 300 300350 350 350Average cost Average cost Average cost OLS method: Make the fit as good as possible ⇓ Make the misfit as low as possible ⇓ Minimize the (vertical) distance between data points and regression line ⇓ Minimize the sum of squared deviations 21 / 33 TERMINOLOGY yi = β0 + β1xi + εi . . . regression line 22 / 33 TERMINOLOGY yi = β0 + β1xi + εi . . . regression line yi . . . dependent/explained variable (i-th observation) 22 / 33 TERMINOLOGY yi = β0 + β1xi + εi . . . regression line yi . . . dependent/explained variable (i-th observation) xi . . . independent/explanatory variable (i-th observation) 22 / 33 TERMINOLOGY yi = β0 + β1xi + εi . . . regression line yi . . . dependent/explained variable (i-th observation) xi . . . independent/explanatory variable (i-th observation) εi . . . random error term/disturbance (of i-th observation) 22 / 33 TERMINOLOGY yi = β0 + β1xi + εi . . . regression line yi . . . dependent/explained variable (i-th observation) xi . . . independent/explanatory variable (i-th observation) εi . . . random error term/disturbance (of i-th observation) β0 . . . intercept parameter (β0 . . . estimate of this parameter) 22 / 33 TERMINOLOGY yi = β0 + β1xi + εi . . . regression line yi . . . dependent/explained variable (i-th observation) xi . . . independent/explanatory variable (i-th observation) εi . . . random error term/disturbance (of i-th observation) β0 . . . intercept parameter (β0 . . . estimate of this parameter) β1 . . . slope parameter (β1 . . . estimate of this parameter) 22 / 33 ORDINARY LEAST SQUARES OLS = fitting the regression line by minimizing the sum of vertical distance between the regression line and the observed points 23 / 33 ORDINARY LEAST SQUARES OLS = fitting the regression line by minimizing the sum of vertical distance between the regression line and the observed points 10 10 1020 20 2030 30 3040 40 4050 50 50OutputOutputOutput150 150 150200 200 200250 250 250300 300 300350 350 350Average cost Average cost Average cost 23 / 33 ORDINARY LEAST SQUARES - PRINCIPLE 24 / 33 ORDINARY LEAST SQUARES - PRINCIPLE Take the squared differences between observed point yi and regression line β0 + β1xi: (yi − β0 − β1xi)2 24 / 33 ORDINARY LEAST SQUARES - PRINCIPLE Take the squared differences between observed point yi and regression line β0 + β1xi: (yi − β0 − β1xi)2 Sum them over all n observations: n i=1 (yi − β0 − β1xi)2 24 / 33 ORDINARY LEAST SQUARES - PRINCIPLE Take the squared differences between observed point yi and regression line β0 + β1xi: (yi − β0 − β1xi)2 Sum them over all n observations: n i=1 (yi − β0 − β1xi)2 Find β0 and β1 such that they minimize this sum β0, β1 = argmin β0,β1 n i=1 (yi − β0 − β1xi)2 24 / 33 ORDINARY LEAST SQUARES - DERIVATION 25 / 33 ORDINARY LEAST SQUARES - DERIVATION β0, β1 = argmin β0,β1 n i=1 (yi − β0 − β1xi)2 25 / 33 ORDINARY LEAST SQUARES - DERIVATION β0, β1 = argmin β0,β1 n i=1 (yi − β0 − β1xi)2 FOC: ∂ ∂β0 : −2 n i=1 yi − β0 − β1xi = 0 ∂ ∂β1 : −2 n i=1 xi yi − β0 − β1xi = 0 25 / 33 ORDINARY LEAST SQUARES - DERIVATION β0, β1 = argmin β0,β1 n i=1 (yi − β0 − β1xi)2 FOC: ∂ ∂β0 : −2 n i=1 yi − β0 − β1xi = 0 ∂ ∂β1 : −2 n i=1 xi yi − β0 − β1xi = 0 We express (on the lecture): β0 = yn − β1xn 25 / 33 ORDINARY LEAST SQUARES - DERIVATION β0, β1 = argmin β0,β1 n i=1 (yi − β0 − β1xi)2 FOC: ∂ ∂β0 : −2 n i=1 yi − β0 − β1xi = 0 ∂ ∂β1 : −2 n i=1 xi yi − β0 − β1xi = 0 We express (on the lecture): β0 = yn − β1xn β1 = n i=1 (xi − xn) yi − yn n i=1 (xi − xn)2 25 / 33 RESIDUAL 26 / 33 RESIDUAL Residual is the vertical difference between the estimated regression line and the observation points 26 / 33 RESIDUAL Residual is the vertical difference between the estimated regression line and the observation points OLS minimizes the sum of squares of all residuals 26 / 33 RESIDUAL Residual is the vertical difference between the estimated regression line and the observation points OLS minimizes the sum of squares of all residuals It is the difference between the true value yi and the estimated value yi = β0 + β1xi 26 / 33 RESIDUAL Residual is the vertical difference between the estimated regression line and the observation points OLS minimizes the sum of squares of all residuals It is the difference between the true value yi and the estimated value yi = β0 + β1xi We define: ei = yi − β0 − β1xi 26 / 33 RESIDUAL Residual is the vertical difference between the estimated regression line and the observation points OLS minimizes the sum of squares of all residuals It is the difference between the true value yi and the estimated value yi = β0 + β1xi We define: ei = yi − β0 − β1xi Residual ei (observed) is not the same as the disturbance εi (unobserved)!!! 26 / 33 RESIDUAL Residual is the vertical difference between the estimated regression line and the observation points OLS minimizes the sum of squares of all residuals It is the difference between the true value yi and the estimated value yi = β0 + β1xi We define: ei = yi − β0 − β1xi Residual ei (observed) is not the same as the disturbance εi (unobserved)!!! Residual is an estimate of the disturbance: ei = εi 26 / 33 RESIDUAL VS. DISTURBANCE 10 10 1020 20 2030 30 3040 40 4050 50 50Output Output Output150 150 150200 200 200250 250 250300 300 300350 350 350Average cost Average cost Average cost True relationship Estimated relationship Disturbance Residual 27 / 33 GETTING BACK TO THE EXAMPLE We have the economic model q = a 2 − b 2 · c 28 / 33 GETTING BACK TO THE EXAMPLE We have the economic model q = a 2 − b 2 · c We estimate qi = β0 + β1ci + εi (having in mind that β0 = a 2 and β1 = −b 2 ) 28 / 33 GETTING BACK TO THE EXAMPLE We have the economic model q = a 2 − b 2 · c We estimate qi = β0 + β1ci + εi (having in mind that β0 = a 2 and β1 = −b 2 ) Over data: Firm 1 2 3 4 5 6 q 15 32 52 14 37 27 c 294 247 153 350 173 218 28 / 33 GETTING BACK TO THE EXAMPLE When we plug in the formula: 29 / 33 GETTING BACK TO THE EXAMPLE When we plug in the formula: β1 = 6 i=1 (ci − c) (qi − q) 6 i=1 (ci − c)2 = −1.77 29 / 33 GETTING BACK TO THE EXAMPLE When we plug in the formula: β1 = 6 i=1 (ci − c) (qi − q) 6 i=1 (ci − c)2 = −1.77 β0 = q − β1c = 71.74 29 / 33 GETTING BACK TO THE EXAMPLE When we plug in the formula: β1 = 6 i=1 (ci − c) (qi − q) 6 i=1 (ci − c)2 = −1.77 β0 = q − β1c = 71.74 The estimated equation is q = 71.74 − 1.77c 29 / 33 GETTING BACK TO THE EXAMPLE When we plug in the formula: β1 = 6 i=1 (ci − c) (qi − q) 6 i=1 (ci − c)2 = −1.77 β0 = q − β1c = 71.74 The estimated equation is q = 71.74 − 1.77c and so a = 2β0 = 143.48 and b = −2β1 = 3.54 29 / 33 MEANING OF REGRESSION COEFFICIENT 30 / 33 MEANING OF REGRESSION COEFFICIENT Consider the model q = β0 + β1c estimated as q = 71.74 − 1.77c q . . . demand for firm’s output c . . . firm’s average cost per unit of output 30 / 33 MEANING OF REGRESSION COEFFICIENT Consider the model q = β0 + β1c estimated as q = 71.74 − 1.77c q . . . demand for firm’s output c . . . firm’s average cost per unit of output Meaning of β1 is the impact of a one unit increase in c on the dependent variable q 30 / 33 MEANING OF REGRESSION COEFFICIENT Consider the model q = β0 + β1c estimated as q = 71.74 − 1.77c q . . . demand for firm’s output c . . . firm’s average cost per unit of output Meaning of β1 is the impact of a one unit increase in c on the dependent variable q When average costs increase by 1 unit, quantity demanded decreases by 1.77 units 30 / 33 BEHIND THE ERROR TERM 31 / 33 BEHIND THE ERROR TERM The stochastic error term must be present in a regression equation because of: 31 / 33 BEHIND THE ERROR TERM The stochastic error term must be present in a regression equation because of: 1. omission of many minor influences (unavailable data) 31 / 33 BEHIND THE ERROR TERM The stochastic error term must be present in a regression equation because of: 1. omission of many minor influences (unavailable data) 2. measurement error 31 / 33 BEHIND THE ERROR TERM The stochastic error term must be present in a regression equation because of: 1. omission of many minor influences (unavailable data) 2. measurement error 3. possibly incorrect functional form 31 / 33 BEHIND THE ERROR TERM The stochastic error term must be present in a regression equation because of: 1. omission of many minor influences (unavailable data) 2. measurement error 3. possibly incorrect functional form 4. stochastic character of unpredictable human behavior 31 / 33 BEHIND THE ERROR TERM The stochastic error term must be present in a regression equation because of: 1. omission of many minor influences (unavailable data) 2. measurement error 3. possibly incorrect functional form 4. stochastic character of unpredictable human behavior Remember that all of these factors are included in the error term and may alter its properties 31 / 33 BEHIND THE ERROR TERM The stochastic error term must be present in a regression equation because of: 1. omission of many minor influences (unavailable data) 2. measurement error 3. possibly incorrect functional form 4. stochastic character of unpredictable human behavior Remember that all of these factors are included in the error term and may alter its properties The properties of the error term determine the properties of the estimates 31 / 33 SUMMARY 32 / 33 SUMMARY We have learned that an econometric analysis consists of 32 / 33 SUMMARY We have learned that an econometric analysis consists of 1. definition of the model 32 / 33 SUMMARY We have learned that an econometric analysis consists of 1. definition of the model 2. estimation 32 / 33 SUMMARY We have learned that an econometric analysis consists of 1. definition of the model 2. estimation 3. interpretation 32 / 33 SUMMARY We have learned that an econometric analysis consists of 1. definition of the model 2. estimation 3. interpretation We have explained the principle of OLS: minimizing the sum of squared differences between the observations and the regression line 32 / 33 SUMMARY We have learned that an econometric analysis consists of 1. definition of the model 2. estimation 3. interpretation We have explained the principle of OLS: minimizing the sum of squared differences between the observations and the regression line We have derived the formulas of the estimates: β1 = n i=1 (xi − xn) yi − yn n i=1 (xi − xn)2 32 / 33 SUMMARY We have learned that an econometric analysis consists of 1. definition of the model 2. estimation 3. interpretation We have explained the principle of OLS: minimizing the sum of squared differences between the observations and the regression line We have derived the formulas of the estimates: β1 = n i=1 (xi − xn) yi − yn n i=1 (xi − xn)2 β0 = yn − β1xn 32 / 33 WHAT’S NEXT 33 / 33 WHAT’S NEXT In the next lectures, we will 33 / 33 WHAT’S NEXT In the next lectures, we will derive estimation formulas for multivariate models 33 / 33 WHAT’S NEXT In the next lectures, we will derive estimation formulas for multivariate models specify properties of the OLS estimator 33 / 33 WHAT’S NEXT In the next lectures, we will derive estimation formulas for multivariate models specify properties of the OLS estimator start using Gretl for data description and estimation 33 / 33