Summary of Causal Inference Methods in Econometrics

Causal Inference Methods: Event Studies, DiD, RDD & IV

Summary Knowledge test Flashcards Podcast Mindmap

Introduction

Difference-in-differences (DiD) methods are widely used to evaluate policy rollouts that affect different places at different times. This guide focuses on practical issues that arise when treatment timing is staggered across units and on modern applied estimators that address those issues.

Definition: A staggered rollout is a treatment that arrives at different units in different periods.

Why staggered rollouts matter

Staggered adoption is common in real-world policies:

Municipal minimum wages adopted across cities over a decade.
EV charging-point expansions across European cities, 2014–2024.
Norwegian municipalities opening their first Vinmonopolet store.
Sweden's Systembolaget and Finland's Alko follow similar rollout logic.
Brazilian Bolsa Família-style transfers entering municipalities at different times.

💡 Věděli jste?Fun fact: Staggered rollouts produce many different "cohorts" defined by their first-treated year, which creates both opportunities and pitfalls for causal estimation.

Two core features of real rollouts

Heterogeneity across cohorts: Early adopters and late adopters often differ in their response sizes because of different contexts or incentives.
Dynamic effects: Treatment effects can evolve over time after adoption (grow, fade, or oscillate).

Definition: A cohort is the set of units that receive treatment for the first time in the same period.

Why standard TWFE can fail (intuitive)

The common two-way fixed effects (TWFE) regression pools all treated units and times into one coefficient. Under staggered timing, TWFE estimates are a weighted average of many 2x2 DiD comparisons. Some of those comparisons are invalid when treatment effects are dynamic or heterogeneous, producing bias and sometimes even estimates with the wrong sign.

The Goodman-Bacon decomposition (conceptual)

Goodman-Bacon (2021) shows the TWFE coefficient equals a weighted average of every possible 2x2 DiD between cohorts and time periods. These 2x2 pieces fall into three types:

Treated cohort vs. never-treated — good control.
Earlier-treated vs. later-treated, using the later cohort’s pre-treatment periods — good control.
Later-treated vs. earlier-treated, using the earlier cohort’s post-treatment periods — forbidden if effects are dynamic.

Why (3) is forbidden: the earlier-treated units already carry evolving post-treatment effects, so they are a moving target and contaminate the estimate.

Definition: A forbidden comparison uses as a control group units that are already affected by treatment in the comparison period.

Negative weights and their consequence

De Chaisemartin and D’Haultfœuille (2020) showed TWFE can place negative weights on some unit-time treatment effects. Consequences:

The TWFE estimate may not be a convex average of any meaningful ATT.
In extreme cases, the TWFE coefficient can have the opposite sign of every true effect.

💡 Věděli jste?Did you know that a negative TWFE estimate does not necessarily mean the true treatment effects are negative? It can be an artifact of negative weighting and dynamic heterogeneity.

The dynamic-effect problem (visualized)

When TWFE uses an earlier cohort that is still experiencing a rising post-treatment effect as a control for a later-treated cohort, the difference subtracts an upward-moving series and understates the true effect for the later cohort.

Definition: Dynamic effects are treatment effects that change with time since treatment (e.g., build-up over years).

When TWFE is valid

TWFE delivers the Average Treatment effect on the Treated (ATT) only if both conditions hold:

There is no staggered timing (all treated at the same time), OR
Treatment effects are constant across cohorts and over time.

In most policy settings, neither condition holds; therefore, a single TWFE coefficient is usually unreliable.

Diagnosis checklist (quick)

Are units treated at different times? If yes, proceed cautiously.
Co

Zaregistruj se pro celé shrnutí

FlashcardsKnowledge testSummaryPodcastMindmap

Start for free

Already have an account? Sign in

Staggered DiD Estimators

Klíčová slova: Event studies — causal inference, Event studies — finance & time series, Event studies — econometrics & DiD, Difference-in-differences methods & theory, Difference-in-differences applied estimators & issues, Applications, Instrumental variables, Regression discontinuity

Klíčové pojmy: A staggered rollout is treatment arriving at different units in different periods., TWFE can be biased under staggered timing when effects are dynamic or heterogeneous., Goodman-Bacon decomposes TWFE into all 2x2 DiD pieces, some invalid., Forbidden comparisons use already-treated units as controls and contaminate estimates., TWFE can assign negative weights, so its sign can be misleading., Callaway–Sant'Anna constructs group-time ATTs robust to heterogeneity and dynamics., did2s residualises using untreated observations then regresses on treatment., Diagnose by plotting treatment timing and cohort event-time dynamics., Prefer cohort-specific effects or aggregated ATTs over single pooled TWFE., Use cluster-robust or estimator-appropriate inference (bootstrap or analytic).

## Introduction Difference-in-differences (DiD) methods are widely used to evaluate policy rollouts that affect different places at different times. This guide focuses on practical issues that arise when treatment timing is staggered across units and on modern applied estimators that address those issues. > Definition: A staggered rollout is a treatment that arrives at different units in different periods. ## Why staggered rollouts matter Staggered adoption is common in real-world policies: - Municipal minimum wages adopted across cities over a decade. - EV charging-point expansions across European cities, 2014–2024. - Norwegian municipalities opening their first Vinmonopolet store. - Sweden's Systembolaget and Finland's Alko follow similar rollout logic. - Brazilian Bolsa Família-style transfers entering municipalities at different times. Fun fact: Staggered rollouts produce many different "cohorts" defined by their first-treated year, which creates both opportunities and pitfalls for causal estimation. ## Two core features of real rollouts 1. **Heterogeneity across cohorts**: Early adopters and late adopters often differ in their response sizes because of different contexts or incentives. 2. **Dynamic effects**: Treatment effects can evolve over time after adoption (grow, fade, or oscillate). > Definition: A cohort is the set of units that receive treatment for the first time in the same period. ## Why standard TWFE can fail (intuitive) The common two-way fixed effects (TWFE) regression pools all treated units and times into one coefficient. Under staggered timing, TWFE estimates are a weighted average of many 2x2 DiD comparisons. Some of those comparisons are invalid when treatment effects are dynamic or heterogeneous, producing bias and sometimes even estimates with the wrong sign. ### The Goodman-Bacon decomposition (conceptual) Goodman-Bacon (2021) shows the TWFE coefficient equals a weighted average of every possible 2x2 DiD between cohorts and time periods. These 2x2 pieces fall into three types: 1. Treated cohort vs. never-treated — good control. 2. Earlier-treated vs. later-treated, using the later cohort’s pre-treatment periods — good control. 3. Later-treated vs. earlier-treated, using the earlier cohort’s post-treatment periods — forbidden if effects are dynamic. Why (3) is forbidden: the earlier-treated units already carry evolving post-treatment effects, so they are a moving target and contaminate the estimate. > Definition: A forbidden comparison uses as a control group units that are already affected by treatment in the comparison period. ### Negative weights and their consequence De Chaisemartin and D’Haultfœuille (2020) showed TWFE can place negative weights on some unit-time treatment effects. Consequences: - The TWFE estimate may not be a convex average of any meaningful ATT. - In extreme cases, the TWFE coefficient can have the opposite sign of every true effect. Did you know that a negative TWFE estimate does not necessarily mean the true treatment effects are negative? It can be an artifact of negative weighting and dynamic heterogeneity. ## The dynamic-effect problem (visualized) When TWFE uses an earlier cohort that is still experiencing a rising post-treatment effect as a control for a later-treated cohort, the difference subtracts an upward-moving series and understates the true effect for the later cohort. > Definition: Dynamic effects are treatment effects that change with time since treatment (e.g., build-up over years). ## When TWFE is valid TWFE delivers the Average Treatment effect on the Treated (ATT) only if **both** conditions hold: 1. There is no staggered timing (all treated at the same time), OR 2. Treatment effects are constant across cohorts and over time. In most policy settings, neither condition holds; therefore, a single TWFE coefficient is usually unreliable. ## Diagnosis checklist (quick) - Are units treated at different times? If yes, proceed cautiously. - Co