1/36
Econometrics
Panel Data Methods
Anna Donina
Lecture 9
Data used in Econometrics
Cross-sectional
• Data for different entities,
• No time dimension,
• Order of data does not matter
Time series
• Data for a single entity collected at multiple time periods.
• Order of data is important
• Observations are typically not independent over time
Panel data
• Data for multiple entities in which outcomes and characteristics of each entity
are observed at multiple points in time.
• Combine cross-sectional and time series issues
• Present several advantages with respect to cross-sectional and time series data
Pooled Cross Sectional and Panel Data
An independently pooled cross section (or repeated cross sectional) is obtained by
sampling randomly from a large population at different points in time (for example,
annual labor force surveys)
A panel dataset contains observations on multiple entities (individuals, states,
companies…), where each entity is observed at two or more points in time.
Hypothetical examples:
• Data on 420 California school districts in 2010 and again in 2012, for 840 obs.
• Data on 50 U.S. states, each state is observed in 3 years, for a total of 150 obs.
• Data on 1000 individuals, in four different months, for 4000 obs total.
Panel Data
A double subscript distinguishes entities (individual units) and time
periods (years)
i = entity (state), n = number of entities, so i = 1,…,n
t = time period (year), T = number of time periods, so t =1,…,T
Data: Suppose we have 1 regressor.
The data are:
(Xit, Yit), i = 1,…,n, t = 1,…,T
Panel Data
Panel data with k regressors:
(X1it, X2it,…,Xkit, Yit), i = 1,…,n, t = 1,…,T
n = number of entities (states)
T = number of time periods (years)
Another term for panel data is longitudinal data
balanced panel: no missing observations, that is, all variables are
observed for all entities (states) and all time periods (years)
Why are Panel Data Useful?
With panel data we can control for factors that:
Vary across entities but do not vary over time
• These could cause omitted variable bias if they are omitted
Are unobserved or unmeasured – and therefore cannot be included in
the regression using multiple regression
Here’s the key idea:
• If an omitted variable does not change over time, then any changes in
Y over time cannot be caused by the omitted variable.
Panel Data: Example of a Dataset
Observational unit: a year in a U.S. state
• 48 U.S. states, so n = # of entities = 48
• 7 years (2002,…, 2008), so T = # of time periods = 7
• Balanced panel, so total # observations = 7×48 = 336
Variables:
• Traffic fatality rate (# traffic deaths in that state in that year,
per 10,000 state residents)
• Tax on a case of beer
• Other (legal driving age, drunk driving laws, etc.)
Policy Analysis with Pooled Cross Sections
Two or more independently sampled cross sections can be used
to evaluate the impact of a certain event or policy change
• Example: Effect of new garbage incinerator on housing prices
(Kiel and McClain (1995))
• Examine the effect of the location of a house on its price before
and after the garbage incinerator was built:
After incinerator was built
(1981) – no causality!
Before incinerator was
built (1978)
• Example: Garbage incinerator and housing prices (cont.)
• It would be wrong to conclude from the regression after the
incinerator is there that being near the incinerator depresses
prices so strongly
• One has to compare with the situation before the incinerator
was built:
• In the given case, this is equivalent to
• This is the so called difference-in-differences estimator (DiD)
Incinerator depresses prices but location was
one with lower prices anyway
Policy Analysis with Pooled Cross Sections
• Difference-in-differences in a regression framework
• In this way standard errors for the DiD-effect can be obtained
• If houses sold before and after the incinerator was built were
systematically different, further explanatory variables should
be included
• This will also reduce the error variance and thus standard
errors
• Before/After comparisons in „natural experiments“
• DiD can be used to evaluate policy changes or other
exogenous events
Differential effect of being in the location and after the incinerator was built
Policy Analysis with Pooled Cross Sections
• Policy evaluation using difference-in-differences
Compare the difference in outcomes of the units that are affected by the policy change (=
treatment group) and those who are not affected (= control group) before and after the
policy was enacted.
For example, the level of unemployment benefits is cut but only for group A (= treatment
group). Group A normally has longer unemployment durations than group B (= control
group). If the difference in unemployment durations between group A and group B
becomes smaller after the reform, reducing unemployment benefits reduces
unemployment duration for those affected.
Caution: Difference-in-differences only works if the difference in outcomes between the
two groups is not changed by other factors than the policy change (e.g. there must be no
differential trends).
Compare outcomes of the two groups
before and after the policy change
Policy Analysis with Pooled Cross Sections
Diff-in-Diff Estimator (DID)
መ𝛽1
𝐷𝑖𝐷
= ത𝑌 𝑡𝑟𝑒𝑎𝑡,𝑎𝑓𝑡𝑒𝑟
− ത𝑌 𝑡𝑟𝑒𝑎𝑡,𝑏𝑒𝑓𝑜𝑟𝑒
− ത𝑌 𝑐𝑜𝑛𝑡𝑟𝑜𝑙,𝑎𝑓𝑡𝑒𝑟
− ത𝑌 𝑐𝑜𝑛𝑡𝑟𝑜𝑙,𝑏𝑒𝑓𝑜𝑟𝑒
• Example: Effect of unemployment on city crime rate
• Assume that no other explanatory variables are available. Will
it be possible to estimate the causal effect of unemployment on
crime?
• Yes, if cities are observed for at least two periods and other
factors affecting crime stay approximately constant over those
periods:
Unobserved time-constant
factors (= fixed effect)
Other unobserved factors (=
idiosyncratic error)
Time dummy for
the second period
Two-Period Panel Data Analysis
• Example: Effect of unemployment on city crime rate (cont.)
• Estimate differenced equation by OLS:
Subtract:
Fixed effect drops out!
Secular increase in crime
+ 1 percentage point unemployment
rate leads to 2.22 more crimes
per 1,000 people
Two-Period Panel Data Analysis
• Discussion of first-differenced panel estimator
• Further explanatory variables may be included in the original
equation
• Note that there may be arbitrary correlation between the
unobserved time-invariant characteristics and the included
explanatory variables
• OLS in the original equation would therefore be inconsistent
• The first-differenced panel estimator is thus a way to
consistently estimate causal effects in the presence of timeinvariant
endogeneity
• For consistency, strict exogeneity has to hold in the original
equation
• First-differenced estimates will be imprecise if explanatory
variables vary only little over time (no estimate possible if
time-invariant)
Two-Period Panel Data Analysis
Fixed Effects Estimation
Consider the panel data model,
FatalityRateit = β0 + β1BeerTaxit + β2Zi + uit
Zi is a factor that does not change over time, at least during the years
on which we have data (examples: “culture” around drinking and driving;
density of cars on the road; ).
• Suppose Zi is not observed, so its omission could result in omitted
variable bias.
• The effect of Zi can be eliminated using T = 2 years by method
described above.
Fixed Effects Estimation
What if you have more than 2 time periods (T > 2)?
Yit = β0 + β1Xit + β2Zi + uit, i =1,…,n, T = 1,…,T
We can rewrite this in two useful ways:
1. “n-1 binary regressor” regression model
2. “Fixed Effects” regression model
We first rewrite this in “fixed effects” form. Suppose we have n = 3
states: California, Texas, and Massachusetts and we want to estimate
the effect of tax on a case of beer on the traffic fatality rate.
Fixed Effects Estimation
Population regression for California (that is, i = CA):
YCA,t = β0 + β1XCA,t + β2ZCA + uCA,t
= (β0 + β2ZCA) + β1XCA,t + uCA,t
Or
YCA,t = αCA + β1XCA,t + uCA,t
• αCA = β0 + β2ZCA doesn’t change over time
• αCA is the intercept for CA, and β1 is the slope
• The intercept is unique to CA, but the slope is the same in all the
states: parallel lines.
Fixed Effects Estimation
YTX,t = β0 + β1XTX,t + β2ZTX + uTX,t
= (β0 + β2ZTX) + β1XTX,t + uTX,t
(population regression for Texas)
or
YTX,t = αTX + β1XTX,t + uTX,t, where αTX = β0 + β2ZTX
Collecting the lines for all three states:
YCA,t = αCA + β1XCA,t + uCA,t
YTX,t = αTX + β1XTX,t + uTX,t
YMA,t = αMA + β1XMA,t + uMA,t
or
Yit = αi + β1Xit + uit, i = CA, TX, MA, T = 1,…,T
Fixed Effects Estimation
Recall that shifts in the intercept can be represented using binary
regressors…
Fixed Effects Estimation
In binary regressor form:
Yit = β0 + γCADCAi + γTXDTXi + β1Xit + uit
DCAi = 1 if state is CA, = 0 otherwise
DTXt = 1 if state is TX, = 0 otherwise
leave out DMAi (why?)
Fixed Effects Estimation
1. “n-1 binary regressor” form
Yit = β0 + β1Xit + γ2D2i + … + γnDni + uit
where D2i = , etc.
2. “Fixed effects” form:
Yit = β1Xit + αi + uit
• αi is called a “state fixed effect” or “state effect” – it is the constant
(fixed) effect of being in state i
1 for i=2 (state #2)
0 otherwise



Fixed Effects Estimation
Three estimation methods:
1. “n-1 binary regressors” OLS regression
2. “Entity-demeaned” OLS regression
3. “Changes” specification, without an intercept
These three methods produce identical estimates of the regression
coefficients, and identical standard errors.
We already did the “changes” specification – but this works well for T
= 2 years
Methods #1 and #2 work for general T
Method #1 is only practical when n isn’t too big
• Fixed effects estimation
• Estimate time-demeaned equation by OLS
• Uses time variation within cross-sectional units (= within-estimator)
Fixed effect, potentially correlated
with explanatory variables
Form time-averages
for each individual
Because (the fixed effect is removed)
Fixed Effects Estimation
• Example: Effect of training grants on firm scrap rate
Fixed-effects estimation using the years 1987, 1988, 1989:
Time-invariant reasons why one firm is more productive than another are controlled for. The
important point is that these may be correlated with the other explanatory variables.
Stars denote
time-demeaning
Training grants significantly improve productivity (with a time lag)
Fixed Effects Estimation
Fixed Effects Estimation with Time Fixed
Effects
An omitted variable might vary over time but not across states:
• Safer cars (air bags, etc.); changes in national laws
• These produce intercepts that change over time
• Let St denote the combined effect of variables which changes over
time but not states (“safer cars”).
• The resulting population regression model is:
Yit = β0 + β1Xit + β2Zi + β3St + uit
Fixed Effects Estimation with Time Fixed
Effects
This model can be recast as having an intercept that varies from one
year to the next:
Yi,1982 = β0 + β1Xi,1982 + β3S1982 + ui,1982
= (β0 + β3S1982) + β1Xi,1982 + ui,1982
= λ1982 + β1Xi,1982 + ui,1982,
where λ1982 = β0 + β3S1982 Similarly,
Yi,1983 = λ1983 + β1Xi,1983 + ui,1983,
where λ1983 = β0 + β3S1983, etc.
Fixed Effects Estimation with Time Fixed
Effects
1. “T-1 binary regressor” formulation:
Yit = β0 + β1Xit + δ2B2t + … δTBTt + uit
where B2t = , etc.
2. “Time effects” formulation:
Yit = β1Xit + λt + uit
1 when t=2 (year #2)
0 otherwise



Fixed Effects Estimation with Time Fixed
Effects
1. “T-1 binary regressor” OLS regression
Yit = β0 + β1Xit + δ2B2it + … δTBTit + uit
• Create binary variables B2,…,BT
• B2 = 1 if t = year #2, = 0 otherwise
• Regress Y on X, B2,…,BT using OLS
• Where’s B1?
2. “Year-demeaned” OLS regression
• Deviate Yit, Xit from year (not state) averages
• Estimate by OLS using “year-demeaned” data
• Discussion of fixed effects estimator
• Strict exogeneity in the original model has to be assumed
• The R-squared of the demeaned equation is inappropriate
• The effect of time-invariant variables cannot be estimated
• But the effect of interactions with time-invariant variables
can be estimated (e.g. the interaction of education with time
dummies)
• If a full set of time dummies are included, the effect of
variables whose change over time is constant cannot be
estimated (e.g. experience)
Fixed Effects Estimation
• Fixed effects or first differencing?
• In the case T = 2, fixed effects and first differencing are identical
• For T > 2, fixed effects is more efficient if classical assumptions
hold
• First differencing may be better in the case of severe serial
correlation in the errors, for example if the errors follow a
random walk
• If T is very large (and N not so large), the panel has a
pronounced time series character and problems such as strong
dependence arise (unit root process – spurious regression)
• In these cases, it is probably better to use first differencing
• Otherwise, it is a good idea to compute both and check
robustness
Fixed Effects Estimation