Panel Data Methods
Ketevani Kapanadze
Brno, 2020
Pooled Cross Sectional and Panel Data
An independently pooled cross section (or repeated cross sectional) is obtained by sampling
randomly from a large population at different points in time (for example, annual labor force
surveys)
A panel dataset contains observations on multiple entities (individuals, states, companies…), where
each entity is observed at two or more points in time.
Hypothetical examples:
• Data on 420 California school districts in 2010 and again in 2012, for 840 observations total.
• Data on 50 U.S. states, each state is observed in 3 years, for a total of 150 observations.
• Data on 1000 individuals, in four different months, for 4000 observations total.
Panel Data
A double subscript distinguishes entities (states) and time periods (years)
i = entity (state), n = number of entities,
so i = 1,…,n
t = time period (year), T = number of time periods
so t =1,…,T
Data: Suppose we have 1 regressor. The data are:
(Xit, Yit), i = 1,…,n, t = 1,…,T
Panel Data
Panel data with k regressors:
(X1it, X2it,…,Xkit, Yit), i = 1,…,n, t = 1,…,T
n = number of entities (states)
T = number of time periods (years)
Some jargon…
• Another term for panel data is longitudinal data
• balanced panel: no missing observations, that is, all variables are observed for all entities (states)
and all time periods (years)
Why are Panel Data Useful?
With panel data we can control for factors that:
• Vary across entities but do not vary over time
• These could cause omitted variable bias if they are omitted
• Are unobserved or unmeasured – and therefore cannot be included in the regression using
multiple regression
Here’s the key idea:
If an omitted variable does not change over time, then any changes in Y over time cannot be
caused by the omitted variable.
Panel Data: Example of a Dataset
Observational unit: a year in a U.S. state
• 48 U.S. states, so n = # of entities = 48
• 7 years (2002,…, 2008), so T = # of time periods = 7
• Balanced panel, so total # observations = 7×48 = 336
Variables:
• Traffic fatality rate (# traffic deaths in that state in that year, per 10,000 state residents)
• Tax on a case of beer
• Other (legal driving age, drunk driving laws, etc.)
Policy Analysis with Pooled Cross Sections
Two or more independently sampled cross sections can be used to evaluate the impact of a
certain event or policy change
• Example: Effect of new garbage incinerator(ინსინერეიტორ) on housing prices (Kiel and
McClain (1995))
• Examine the effect of the location of a house on its price before and after the garbage
incinerator was built:
After incinerator was built
(1981)
Before incinerator was built
(1978)
• Example: Garbage incinerator and housing prices (cont.)
• It would be wrong to conclude from the regression after the incinerator is there that being
near the incinerator depresses prices so strongly
• One has to compare with the situation before the incinerator was built:
• In the given case, this is equivalent to
• This is the so called difference-in-differences estimator (DiD)
Incinerator depresses prices but location was
one with lower prices anyway
Policy Analysis with Pooled Cross Sections
• Difference-in-differences in a regression framework
• In this way standard errors for the DiD-effect can be obtained
• If houses sold before and after the incinerator was built were systematically different, further
explanatory variables should be included
• This will also reduce the error variance and thus standard errors
• Before/After comparisons in „natural experiments“
• DiD can be used to evaluate policy changes or other exogenous events
Differential effect of being in the location and after the incinerator was built
Policy Analysis with Pooled Cross Sections
• Policy evaluation using difference-in-differences
Caution: Difference-in-differences only works if the difference in outcomes between the two groups is not
changed by other factors than the policy change (e.g. there must be no differential trends).
Compare outcomes of the two groups
before and after the policy change
Policy Analysis with Pooled Cross Sections
= ( – ) – ( – )1
ˆdiffs in diffs
  
Y treat,after
Y treat,before
Y control,after
Y control,before
Diff-in-Diff Estimator (DID)
A
B
A
A’
B
• Example: Effect of unemployment on city crime rate
Unobserved time-constant
factors (= fixed effect)
Other unobserved factors (=
idiosyncratic error)
Time dummy for the
second period
Two-Period Panel Data Analysis
• Example: Effect of unemployment on city crime rate (cont.)
• Estimate differenced equation by OLS:
Subtract:
Secular increase in crime
Two-Period Panel Data Analysis
• Discussion of first-differenced panel estimator
• Further explanatory variables may be included in the original equation
• Note that there may be arbitrary correlation between the unobserved time-invariant
characteristics and the included explanatory variables
• OLS in the original equation would therefore be inconsistent
• The first-differenced panel estimator is thus a way to consistently estimate causal effects in
the presence of time-invariant endogeneity
• For consistency, strict exogeneity has to hold in the original equation
• First-differenced estimates will be imprecise if explanatory variables vary only little over time
(no estimate possible if time-invariant)
Two-Period Panel Data Analysis
Fixed Effects Estimation
Consider the panel data model,
FatalityRateit = β0 + β1BeerTaxit + β2Zi + uit
Zi is a factor that does not change over time, at least during the years on which we have data
(examples: ; density of cars on the road; ).
• Suppose Zi is not observed, so its omission could result in omitted variable bias.
• The effect of Zi can be eliminated using T = 2 years by method described above (diff- diff).
Fixed Effects Estimation
Yit = β0 + β1Xit + β2Zi + uit, i =1,…,n, T = 1,…,T
We can rewrite this in two useful ways:
1. “n-1 binary regressor” regression model
2. “Fixed Effects” regression model
Fixed Effects Estimation
Population regression for California (that is, i = CA):
YCA,t = β0 + β1XCA,t + β2ZCA + uCA,t
= (β0 + β2ZCA) + β1XCA,t + uCA,t
Or
YCA,t = αCA + β1XCA,t + uCA,t
• αCA = β0 + β2ZCA doesn’t change over time
• αCA is the intercept for CA, and β1 is the slope
• The intercept is unique to CA, but the slope is the same in all the states: parallel lines.
Fixed Effects Estimation
YTX,t = β0 + β1XTX,t + β2ZTX + uTX,t
= (β0 + β2ZTX) + β1XTX,t + uTX,t (population regression for Texas)
or
YTX,t = αTX + β1XTX,t + uTX,t, where αTX = β0 + β2ZTX
Collecting the lines for all three states:
YCA,t = αCA + β1XCA,t + uCA,t
YTX,t = αTX + β1XTX,t + uTX,t
YMA,t = αMA + β1XMA,t + uMA,t
or
Yit = αi + β1Xit + uit, i = CA, TX, MA, T = 1,…,T
Fixed Effects Estimation
In binary regressor form:
Yit = β0 + γCADCAi + γTXDTXi + β1Xit + uit
• DCAi = 1 if state is CA, = 0 otherwise
• DTXt = 1 if state is TX, = 0 otherwise
• leave out DMAi (why?)
Fixed Effects Estimation
1. “n-1 binary regressor” form
Yit = β0 + β1Xit + γ2D2i + … + γnDni + uit
where D2i = , etc.
2. “Fixed effects” form:
Yit = β1Xit + αi + uit
• αi is called a “state fixed effect” or “state effect” – it is the constant (fixed) effect of being in state
i
1 for i=2 (state #2)
0 otherwise



• Fixed effects estimation
• Estimate time-demeaned equation by OLS
• Uses time variation within cross-sectional units (= within-estimator)
Fixed effect, potentially correlated
with explanatory variables
Form time-averages
for each individual
Because (the fixed effect is removed)
Fixed Effects Estimation
An omitted variable might vary over time but not across states:
• Safer cars (air bags, etc.); changes in national laws
• These produce intercepts that change over time
Yit = β0 + β1Xit + β2Zi + β3St + uit
Fixed Effects Estimation with Time Fixed Effects
Fixed Effects Estimation with Time Fixed Effects
Yi,1982 = β0 + β1Xi,1982 + β3S1982 + ui,1982
= (β0 + β3S1982) + β1Xi,1982 + ui,1982
= λ1982 + β1Xi,1982 + ui,1982,
where λ1982 = β0 + β3S1982 Similarly,
Yi,1983 = λ1983 + β1Xi,1983 + ui,1983,
where λ1983 = β0 + β3S1983, etc.
Fixed Effects Estimation with Time Fixed Effects
1. “T-1 binary regressor” formulation:
Yit = β0 + β1Xit + δ2B2t + … δTBTt + uit
where B2t = , etc.
2. “Time effects” formulation:
Yit = β1Xit + λt + uit
1 when t=2 (year #2)
0 otherwise



• Discussion of fixed effects estimator
• Strict exogeneity in the original model has to be assumed
• The R-squared of the demeaned equation is inappropriate
• The effect of time-invariant variables cannot be estimated
Fixed Effects Estimation
Final Exam
• May 15, at 9am in Zoom 
• Exam will take place in Zoom, May 15, at 9am-11am
• Let’s meet in the Zoom at 8:45am, to check that there are no technical issues.
• Exam will start exactly at 9am!
• Please make sure you have good internet connection
• All cameras MUST be turned on
• You can ask questions during the exam ONLY in the private chat
• It is closed book exam, cheating on final exam can result in serious consequences for the
• student
• Handwritings must be legible enough!
• At 8:55am I will share protected final exam file to the class
• At 11 am, exam is over, you will take photos of your solutions and send them to my email
• address, during the meeting. I will close the exam meeting as soon as I get all your exam
• solutions
• Don’t forget to write your name and surname in the email, and in the SUBJECT of the email
• you must write down “Metrics Final Exam”.