1/36 Econometrics Panel Data Methods Anna Donina Lecture 9 Data used in Econometrics Cross-sectional • Data for different entities, • No time dimension, • Order of data does not matter Time series • Data for a single entity collected at multiple time periods. • Order of data is important • Observations are typically not independent over time Panel data • Data for multiple entities in which outcomes and characteristics of each entity are observed at multiple points in time. • Combine cross-sectional and time series issues • Present several advantages with respect to cross-sectional and time series data Pooled Cross Sectional and Panel Data An independently pooled cross section (or repeated cross sectional) is obtained by sampling randomly from a large population at different points in time (for example, annual labor force surveys) A panel dataset contains observations on multiple entities (individuals, states, companies…), where each entity is observed at two or more points in time. Hypothetical examples: • Data on 420 California school districts in 2010 and again in 2012, for 840 obs. • Data on 50 U.S. states, each state is observed in 3 years, for a total of 150 obs. • Data on 1000 individuals, in four different months, for 4000 obs total. Panel Data A double subscript distinguishes entities (individual units) and time periods (years) i = entity (state), n = number of entities, so i = 1,…,n t = time period (year), T = number of time periods, so t =1,…,T Data: Suppose we have 1 regressor. The data are: (Xit, Yit), i = 1,…,n, t = 1,…,T Panel Data Panel data with k regressors: (X1it, X2it,…,Xkit, Yit), i = 1,…,n, t = 1,…,T n = number of entities (states) T = number of time periods (years) Another term for panel data is longitudinal data balanced panel: no missing observations, that is, all variables are observed for all entities (states) and all time periods (years) Why are Panel Data Useful? With panel data we can control for factors that: Vary across entities but do not vary over time • These could cause omitted variable bias if they are omitted Are unobserved or unmeasured – and therefore cannot be included in the regression using multiple regression Here’s the key idea: • If an omitted variable does not change over time, then any changes in Y over time cannot be caused by the omitted variable. Panel Data: Example of a Dataset Observational unit: a year in a U.S. state • 48 U.S. states, so n = # of entities = 48 • 7 years (2002,…, 2008), so T = # of time periods = 7 • Balanced panel, so total # observations = 7×48 = 336 Variables: • Traffic fatality rate (# traffic deaths in that state in that year, per 10,000 state residents) • Tax on a case of beer • Other (legal driving age, drunk driving laws, etc.) Policy Analysis with Pooled Cross Sections Two or more independently sampled cross sections can be used to evaluate the impact of a certain event or policy change • Example: Effect of new garbage incinerator on housing prices (Kiel and McClain (1995)) • Examine the effect of the location of a house on its price before and after the garbage incinerator was built: After incinerator was built (1981) – no causality! Before incinerator was built (1978) • Example: Garbage incinerator and housing prices (cont.) • It would be wrong to conclude from the regression after the incinerator is there that being near the incinerator depresses prices so strongly • One has to compare with the situation before the incinerator was built: • In the given case, this is equivalent to • This is the so called difference-in-differences estimator (DiD) Incinerator depresses prices but location was one with lower prices anyway Policy Analysis with Pooled Cross Sections • Difference-in-differences in a regression framework • In this way standard errors for the DiD-effect can be obtained • If houses sold before and after the incinerator was built were systematically different, further explanatory variables should be included • This will also reduce the error variance and thus standard errors • Before/After comparisons in „natural experiments“ • DiD can be used to evaluate policy changes or other exogenous events Differential effect of being in the location and after the incinerator was built Policy Analysis with Pooled Cross Sections • Policy evaluation using difference-in-differences Compare the difference in outcomes of the units that are affected by the policy change (= treatment group) and those who are not affected (= control group) before and after the policy was enacted. For example, the level of unemployment benefits is cut but only for group A (= treatment group). Group A normally has longer unemployment durations than group B (= control group). If the difference in unemployment durations between group A and group B becomes smaller after the reform, reducing unemployment benefits reduces unemployment duration for those affected. Caution: Difference-in-differences only works if the difference in outcomes between the two groups is not changed by other factors than the policy change (e.g. there must be no differential trends). Compare outcomes of the two groups before and after the policy change Policy Analysis with Pooled Cross Sections Diff-in-Diff Estimator (DID) መ𝛽1 𝐷𝑖𝐷 = ത𝑌 𝑡𝑟𝑒𝑎𝑡,𝑎𝑓𝑡𝑒𝑟 − ത𝑌 𝑡𝑟𝑒𝑎𝑡,𝑏𝑒𝑓𝑜𝑟𝑒 − ത𝑌 𝑐𝑜𝑛𝑡𝑟𝑜𝑙,𝑎𝑓𝑡𝑒𝑟 − ത𝑌 𝑐𝑜𝑛𝑡𝑟𝑜𝑙,𝑏𝑒𝑓𝑜𝑟𝑒 • Example: Effect of unemployment on city crime rate • Assume that no other explanatory variables are available. Will it be possible to estimate the causal effect of unemployment on crime? • Yes, if cities are observed for at least two periods and other factors affecting crime stay approximately constant over those periods: Unobserved time-constant factors (= fixed effect) Other unobserved factors (= idiosyncratic error) Time dummy for the second period Two-Period Panel Data Analysis • Example: Effect of unemployment on city crime rate (cont.) • Estimate differenced equation by OLS: Subtract: Fixed effect drops out! Secular increase in crime + 1 percentage point unemployment rate leads to 2.22 more crimes per 1,000 people Two-Period Panel Data Analysis • Discussion of first-differenced panel estimator • Further explanatory variables may be included in the original equation • Note that there may be arbitrary correlation between the unobserved time-invariant characteristics and the included explanatory variables • OLS in the original equation would therefore be inconsistent • The first-differenced panel estimator is thus a way to consistently estimate causal effects in the presence of timeinvariant endogeneity • For consistency, strict exogeneity has to hold in the original equation • First-differenced estimates will be imprecise if explanatory variables vary only little over time (no estimate possible if time-invariant) Two-Period Panel Data Analysis Fixed Effects Estimation Consider the panel data model, FatalityRateit = β0 + β1BeerTaxit + β2Zi + uit Zi is a factor that does not change over time, at least during the years on which we have data (examples: “culture” around drinking and driving; density of cars on the road; ). • Suppose Zi is not observed, so its omission could result in omitted variable bias. • The effect of Zi can be eliminated using T = 2 years by method described above. Fixed Effects Estimation What if you have more than 2 time periods (T > 2)? Yit = β0 + β1Xit + β2Zi + uit, i =1,…,n, T = 1,…,T We can rewrite this in two useful ways: 1. “n-1 binary regressor” regression model 2. “Fixed Effects” regression model We first rewrite this in “fixed effects” form. Suppose we have n = 3 states: California, Texas, and Massachusetts and we want to estimate the effect of tax on a case of beer on the traffic fatality rate. Fixed Effects Estimation Population regression for California (that is, i = CA): YCA,t = β0 + β1XCA,t + β2ZCA + uCA,t = (β0 + β2ZCA) + β1XCA,t + uCA,t Or YCA,t = αCA + β1XCA,t + uCA,t • αCA = β0 + β2ZCA doesn’t change over time • αCA is the intercept for CA, and β1 is the slope • The intercept is unique to CA, but the slope is the same in all the states: parallel lines. Fixed Effects Estimation YTX,t = β0 + β1XTX,t + β2ZTX + uTX,t = (β0 + β2ZTX) + β1XTX,t + uTX,t (population regression for Texas) or YTX,t = αTX + β1XTX,t + uTX,t, where αTX = β0 + β2ZTX Collecting the lines for all three states: YCA,t = αCA + β1XCA,t + uCA,t YTX,t = αTX + β1XTX,t + uTX,t YMA,t = αMA + β1XMA,t + uMA,t or Yit = αi + β1Xit + uit, i = CA, TX, MA, T = 1,…,T Fixed Effects Estimation Recall that shifts in the intercept can be represented using binary regressors… Fixed Effects Estimation In binary regressor form: Yit = β0 + γCADCAi + γTXDTXi + β1Xit + uit DCAi = 1 if state is CA, = 0 otherwise DTXt = 1 if state is TX, = 0 otherwise leave out DMAi (why?) Fixed Effects Estimation 1. “n-1 binary regressor” form Yit = β0 + β1Xit + γ2D2i + … + γnDni + uit where D2i = , etc. 2. “Fixed effects” form: Yit = β1Xit + αi + uit • αi is called a “state fixed effect” or “state effect” – it is the constant (fixed) effect of being in state i 1 for i=2 (state #2) 0 otherwise    Fixed Effects Estimation Three estimation methods: 1. “n-1 binary regressors” OLS regression 2. “Entity-demeaned” OLS regression 3. “Changes” specification, without an intercept These three methods produce identical estimates of the regression coefficients, and identical standard errors. We already did the “changes” specification – but this works well for T = 2 years Methods #1 and #2 work for general T Method #1 is only practical when n isn’t too big • Fixed effects estimation • Estimate time-demeaned equation by OLS • Uses time variation within cross-sectional units (= within-estimator) Fixed effect, potentially correlated with explanatory variables Form time-averages for each individual Because (the fixed effect is removed) Fixed Effects Estimation • Example: Effect of training grants on firm scrap rate Fixed-effects estimation using the years 1987, 1988, 1989: Time-invariant reasons why one firm is more productive than another are controlled for. The important point is that these may be correlated with the other explanatory variables. Stars denote time-demeaning Training grants significantly improve productivity (with a time lag) Fixed Effects Estimation Fixed Effects Estimation with Time Fixed Effects An omitted variable might vary over time but not across states: • Safer cars (air bags, etc.); changes in national laws • These produce intercepts that change over time • Let St denote the combined effect of variables which changes over time but not states (“safer cars”). • The resulting population regression model is: Yit = β0 + β1Xit + β2Zi + β3St + uit Fixed Effects Estimation with Time Fixed Effects This model can be recast as having an intercept that varies from one year to the next: Yi,1982 = β0 + β1Xi,1982 + β3S1982 + ui,1982 = (β0 + β3S1982) + β1Xi,1982 + ui,1982 = λ1982 + β1Xi,1982 + ui,1982, where λ1982 = β0 + β3S1982 Similarly, Yi,1983 = λ1983 + β1Xi,1983 + ui,1983, where λ1983 = β0 + β3S1983, etc. Fixed Effects Estimation with Time Fixed Effects 1. “T-1 binary regressor” formulation: Yit = β0 + β1Xit + δ2B2t + … δTBTt + uit where B2t = , etc. 2. “Time effects” formulation: Yit = β1Xit + λt + uit 1 when t=2 (year #2) 0 otherwise    Fixed Effects Estimation with Time Fixed Effects 1. “T-1 binary regressor” OLS regression Yit = β0 + β1Xit + δ2B2it + … δTBTit + uit • Create binary variables B2,…,BT • B2 = 1 if t = year #2, = 0 otherwise • Regress Y on X, B2,…,BT using OLS • Where’s B1? 2. “Year-demeaned” OLS regression • Deviate Yit, Xit from year (not state) averages • Estimate by OLS using “year-demeaned” data • Discussion of fixed effects estimator • Strict exogeneity in the original model has to be assumed • The R-squared of the demeaned equation is inappropriate • The effect of time-invariant variables cannot be estimated • But the effect of interactions with time-invariant variables can be estimated (e.g. the interaction of education with time dummies) • If a full set of time dummies are included, the effect of variables whose change over time is constant cannot be estimated (e.g. experience) Fixed Effects Estimation • Fixed effects or first differencing? • In the case T = 2, fixed effects and first differencing are identical • For T > 2, fixed effects is more efficient if classical assumptions hold • First differencing may be better in the case of severe serial correlation in the errors, for example if the errors follow a random walk • If T is very large (and N not so large), the panel has a pronounced time series character and problems such as strong dependence arise (unit root process – spurious regression) • In these cases, it is probably better to use first differencing • Otherwise, it is a good idea to compute both and check robustness Fixed Effects Estimation