Causal Inference Methods: Event Studies, DiD, RDD & IV
Délka: 26 minut
Způsobila to aplikace?
Kohout a východ slunce
Kouzlo kontrafaktuálu
Stock Market Shockwaves
One Timeline, Two Trends
When Trends Don't Match
Old Methods, Blurry Pictures
A Modern Data Toolkit
The Bottom Line
Two Naive Mistakes
The Double Difference
The Golden Rule
A Second Opinion
Three Tools, One Answer
When Things Get Messy
The EV Charging Story
The Selection Problem
The Clever Solution
Compliers and Local Effects
A Sudden Jump
The Logic of the Ledge
The Local Effect
Wrapping Up
Ava: Představte si studenta jménem Tomáš. Známky nic moc, motivace na bodu mrazu. Jednoho dne si stáhne novou studijní aplikaci. A ejhle, o týden později má v testu o dvacet procent lepší výsledek. Otázka, kterou si dnes položíme, zní: způsobila to ta aplikace?
Ethan: To je skvělá otázka, Avo. A je to přesně ten typ problému, na který se zaměřují takzvané eventové studie. A rychlá odpověď je... možná?
Ava: No, to není zrovna uspokojivé. Tohle je Studyfi Podcast, kde hledáme o něco lepší odpovědi.
Ethan: Dobře, pojďme na to. Eventová studie je v podstatě chytrý způsob porovnání "před" a "po". V nějakém bodě nastane událost – "event" – a my se díváme, co se změnilo v důsledku toho.
Ava: Jako u Tomáše. Událostí bylo stažení aplikace a změnou bylo lepší skóre v testu. To zní opravdu jednoduše. Možná až příliš jednoduše?
Ethan: Přesně tak, je v tom háček. Znáš ten klasický příklad? Kohout každé ráno zakokrhá a hned potom vyjde slunce. Znamená to, že kohout svým kokrháním způsobuje východ slunce?
Ava: Samozřejmě že ne. Slunce by vyšlo tak jako tak, i kdyby ten kohout spal až do oběda.
Ethan: Přesně! A to je ten hlavní problém, kterému čelíme. Musíme si být jistí, že změnu, kterou vidíme, nezpůsobilo něco úplně jiného, co se stalo ve stejnou dobu. Třeba se Tomáš začal víc vyspat, nebo byl ten test prostě jednodušší.
Ava: Dobře, chápu. Takže jak oddělíme skutečný efekt té aplikace od... no, od toho pomyslného kohouta?
Ethan: Klíčové je odhadnout takzvaný kontrafaktuál. Zní to složitě, ale je to jen fantazijní slovo pro to, co by se stalo, kdyby k té události – k tomu eventu – vůbec nedošlo.
Ava: Aha! Takže se snažíme předpovědět, jaký výsledek by Tomáš měl v testu, kdyby si tu aplikaci nikdy nestáhl?
Ethan: Přesně tak. Vytvoříme si takovou hypotetickou časovou linku. A pak porovnáváme skutečný výsledek "po" s tímhle hypotetickým výsledkem. Ten rozdíl mezi nimi... to je náš nejlepší odhad skutečného efektu.
Ava: Takže je to vlastně metoda, jak se nenechat napálit náhodou nebo jinými vlivy.
Ethan: Krásně řečeno. Je to jeden ze základních nástrojů, jak se ve světě plném dat dopátrat skutečné příčiny a následku.
Ava: Okay, so that's a brilliant way to isolate an effect using a control group. But what if you can't find a perfect comparison? Where else do we see this 'what if' thinking?
Ethan: Great question. It leads us directly to a huge field called 'event studies'. And the classic playground for this is finance... specifically, the stock market.
Ava: Why the stock market? Is it just because that's where the money is?
Ethan: Well, partly! But it's really because of two things: tons of high-frequency data, and something called the 'efficient markets hypothesis'.
Ava: Which basically says that stock prices reflect all available information, right?
Ethan: Exactly. So when new information hits—like a surprise announcement—the effect on the stock price is almost instant. It’s like dropping a rock into a perfectly still pond. You see the ripple immediately.
Ava: So you don't have to wait months to see the impact. You can measure it in minutes or hours.
Ethan: You got it. The method is pretty neat. First, you take a period *before* the event—the 'estimation period'—and build a model of what the stock's return *should* have been on a normal day.
Ava: That's our counterfactual again. The boring timeline where nothing happened.
Ethan: Precisely. Then you compare that prediction to the *actual* return right around the event. That difference is called the 'abnormal return'.
Ava: And that's our effect? The 'surprise' quantified?
Ethan: Yes! A positive abnormal return means the market loved the news. A negative one means someone's having a very bad day. You can literally see the market react in real-time.
Ava: That's fascinating. But it relies on having a stock market. What if you're just looking at one long timeline, like a country's unemployment rate after a new law is passed?
Ethan: Excellent point. For that, we can use a related idea called 'segmented regression'.
Ava: That sounds a bit more intimidating.
Ethan: It's actually really intuitive! Think of it this way: you just draw the trend line for the data *before* the event. Then you draw a separate trend line for the data *after* the event.
Ava: And you just... look to see if the second line is different from the first?
Ethan: That's the core of it! We ask two questions. One, was there an immediate jump or drop right when the event happened? And two, did the *slope* of the trend change? Did things start improving faster, or slower, afterwards?
Ava: So you're comparing the 'after' reality to an extension of the 'before' trend. What's the catch?
Ethan: The catch is that you have to be confident your 'before' trend was correctly specified. A great way to check this is with a placebo test.
Ava: Let me guess. You run the same test on a timeline where the event *didn't* happen?
Ethan: Exactly! And if you find a big effect there... well, it means your model is seeing ghosts.
Ava: I love that. A ghost-hunting method for statisticians! So, what happens when we move from one timeline to having multiple groups again?
Ethan: It's a fantastic question, Ava. Moving to multiple groups is where this gets really powerful... but also where we can get into trouble. It all comes back to that key idea we talked about: parallel trends.
Ava: Right, the idea that both groups were moving in the same direction before the event happened.
Ethan: Exactly. Let's use an example. Imagine Samwise Gamgee from Lord of the Rings puts a cap on how many hours hobbits can work harvesting pipe-weed.
Ava: Okay, I'm with you. A labor cap in the Shire.
Ethan: He runs his study and sees that pipe-weed production dropped after his new rule. He thinks, "Great! My policy worked!" But then he looks closer at the data...
Ava: And what does he see?
Ethan: He sees that pipe-weed production was *already* falling before he even made the rule. The trend was already going down.
Ava: Oh, so it wasn't a fair comparison. The decline might have happened anyway.
Ethan: Precisely. That's a failed parallel trends test. It's like trying to prove your new diet works when you were already losing weight before you started it. The pre-event trends weren't parallel, so we can't trust the result.
Ava: So that's the big red flag to look for. Are there others?
Ethan: There is. For a long time, researchers used a method we can just call the 'old way' or TWFE. The problem is, when you have groups getting the treatment at different times—what we call staggered adoption—this old method gets... confused.
Ava: Confused how?
Ethan: It ends up using groups that have *already* been treated as part of the control group for groups treated later. It's a mess.
Ava: That sounds like it would cause problems.
Ethan: Big problems. Think of it like taking a photo with a shaky hand. The 'old way' gives you a smudged, blurry picture of the treatment effect. It often understates the true impact. It basically puts a fake mustache on an old trend and tries to pass it off as something new.
Ava: A fake mustache! I love that. So what's the solution? A better camera?
Ethan: A much better camera! Modern methods, like one called Callaway and Sant'Anna, are designed to fix this. They're much smarter about choosing the right comparison group.
Ava: So what does a modern workflow look like for a researcher?
Ethan: First, you plot everything out. You actually make a picture to see when different groups get treated. Then, you look at the raw data for each group to spot any obvious trend problems, like with Sam's pipe-weed.
Ava: So you start with your eyes, not just with equations.
Ethan: Always. After that, you run the modern, smarter models. These models carefully compare each treated group to only the units that *haven't* been treated yet. It gives you a much cleaner, sharper picture.
Ava: Okay, so to recap... Difference-in-Differences, or DiD, is like an event study but with a control group.
Ethan: Yep. And the whole thing rests on the parallel trends assumption—that the groups were behaving similarly beforehand.
Ava: If the pre-trends aren't flat, like in Sam's case, we should be suspicious. And we should use modern methods to avoid the blurry, 'fake mustache' results of the past.
Ethan: You've got it. These new tools don't magically fix bad data, but they stop us from making unforced errors in our analysis. Which is a huge step forward.
Ava: A huge step indeed. Now, speaking of steps... what's the next logical step for researchers once they have these initial results?
Ethan: That’s a fantastic question, Ava. The next step is all about honesty. We have to be honest with ourselves about what we can *truly* measure.
Ava: Honest? What do you mean? Are you saying researchers lie?
Ethan: Not intentionally! But it's easy to fool yourself. The core problem in all of this is finding a good 'counterfactual'. We want to know the causal effect of something, right?
Ava: Right. Like, does this new teaching method actually improve test scores?
Ethan: Exactly. To know that, we'd need to see the test scores of students who used the new method... and also see *their own scores* if they *hadn't* used it. But we can't. We only see one reality.
Ava: It’s like the hospital example. We can't know how sick a patient would have been if they hadn't gone to the hospital. We can only see what did happen.
Ethan: Precisely. We're always missing that counterfactual. And when we try to fake it with a bad comparison, we make some classic mistakes.
Ava: Okay, so what are the classic bad comparisons? The research 'don'ts'?
Ethan: The research 'don'ts'. I like that. The first is the simple 'before-versus-after' comparison. You just look at the group that got the treatment and see if they changed.
Ava: That sounds... too simple. What's wrong with it?
Ethan: Well, other things might have changed at the same time! The source material has a great example with Saruman from Lord of the Rings. Let's say he introduces a new wage rule for his ring-seeking orcs, and then recruitment falls.
Ava: So he thinks his policy worked.
Ethan: Right! But maybe orc recruitment was falling everywhere because the Ents became really persuasive speakers that year. His policy had nothing to do with it.
Ava: Okay, that makes sense. So what's mistake number two?
Ethan: Mistake two is comparing the treated group to a control group, but only *after* the treatment. Saruman compares his ring-seeking orcs to his Isengard industry orcs.
Ava: Let me guess... those groups were probably different to begin with. The ring-seekers are probably way more adventurous and worse at filling out paperwork.
Ethan: You got it. That's called selection bias. The groups weren't comparable from the start.
Ava: So... one method ignores time, the other ignores that the groups are different. They both seem pretty flawed.
Ethan: They are! But here’s the beautiful part. The method called 'Difference-in-Differences'—or DiD for short—combines these two flawed ideas to create one really clever one.
Ava: Okay, you have my attention. How does that work?
Ethan: Think of it like a two-step calculation. First, you calculate the change over time for the treated group... so their 'after' minus their 'before'.
Ava: Alright, that's the first difference.
Ethan: Then, you do the exact same thing for the control group. You calculate *their* change over time. That's the second difference.
Ava: And then... you subtract the control group's change from the treated group's change?
Ethan: Bingo! You find the *extra* change that happened only to the treated group, after accounting for the trend that was happening to everyone. It's brilliant.
Ava: That actually is brilliant. It neatly solves both problems at once. It controls for time trends *and* for those starting differences between the groups.
Ethan: It does. But... it all depends on one huge, crucial assumption. It's the golden rule of DiD.
Ava: There's always a catch, isn't there? What is it?
Ethan: It’s called the 'Parallel Trends' assumption. In simple terms, it means that if the treatment had never happened, the two groups would have continued to change in the same way. Their trends would have been parallel.
Ava: Ah, I see. So you need to believe that the control group is a good stand-in for what *would have happened* to the treated group.
Ethan: Exactly. We use the control group's trend to build that missing counterfactual we talked about. If their trends weren't parallel to begin with, the whole calculation falls apart.
Ava: So, to recap: Difference-in-Differences is a powerful tool for finding causal effects. But it only works if we can assume that our treated and control groups were on similar paths before the experiment started. Which leads to the obvious question... how do we actually check that?
Ethan: That's the million-dollar question. And it's why statisticians never stop creating new tools. One popular way to double-check our work is with a method called 'did2s'.
Ava: Did... two ess? Sounds like a band from the 90s.
Ethan: Close! It stands for Difference-in-Differences in two stages. It's a clever approach developed by John Gardner. Think of it this way...
Ava: Okay, I'm ready.
Ethan: First, the model looks *only* at the untreated group to figure out the normal, everyday changes over time. It gets a baseline.
Ava: So it learns the expected pattern from the places that *didn't* get a new alcohol store.
Ethan: Exactly. Then, in stage two, it subtracts that baseline pattern from *everyone*. What's left over should just be the treatment effect. It's a different way to get to the same answer.
Ava: So did you run this 'did2s' model on the Norwegian data?
Ethan: We did. We compared three different estimators. First, our simple, 'naive' model. Second, the Callaway and Sant'Anna method we discussed. And third, this new did2s.
Ava: And what did you find?
Ethan: Here's the cool part. The naive model gave us a result of about 0.96. But the other two methods—the more robust ones—both came in much higher, around 1.3 to 1.4.
Ava: Wow, that's a big difference. It means the simple model was underestimating the effect.
Ethan: By a lot. The key takeaway is that the two more advanced methods basically agreed with each other. It gives us much more confidence in the result. It's like getting a second and third opinion from a doctor, and they both say the same thing.
Ava: That makes sense. It seems like these newer methods are crucial for getting an accurate picture. But what happens when the real world gets even more complicated?
Ethan: Well, it often does. So far, we've talked about a simple case: one group gets treated, one doesn't. But what if the treatment rolls out over time?
Ava: What do you mean by 'rolls out'?
Ethan: Think about a new minimum wage law. It doesn't hit every city in the country on the same day. Some cities might adopt it in 2020, others in 2022, and some in 2024. That's a staggered rollout.
Ava: I see. And these staggered rollouts... I'm guessing they create a whole new set of problems for us?
Ethan: You guessed right. They can seriously break our traditional models, and that's exactly what we need to tackle next.
Ava: Okay, so break it down for me. Give me a real-world case where these staggered rollouts really mess things up.
Ethan: Happy to. Let's talk about electric cars and clean air. A perfect, recent example.
Ava: I'm listening. This sounds important.
Ethan: Between 2014 and 2022, many cities across Europe rushed to install public EV charging stations. But they all did it on their own schedule. A classic staggered rollout.
Ava: And the big question was… did it actually work? Did all those chargers help reduce air pollution?
Ethan: Exactly. We wanted to see if nitrogen dioxide—or NO₂—levels went down. But here's the catch… the effect isn't instant. It builds up over years as more people gradually switch to EVs.
Ava: Ah, so it's a slow burn. And that's what breaks the traditional model?
Ethan: You got it. When we use that older Two-Way Fixed Effects model, it tells us NO₂ dropped by about 0.9 points. Which is nice, but…
Ava: But not the whole story?
Ethan: Not even close! The newer method, Callaway and Sant'Anna, shows the *real* effect was a drop of 1.8 points. The old model missed more than half of the positive impact!
Ava: Wow! So the policy was twice as effective as we first thought? It’s like using a blurry camera versus a high-definition one.
Ethan: That's a perfect way to put it. The old model gave us a fuzzy, misleading picture. It dramatically understated the good news.
Ava: The key takeaway here is pretty clear then. Using the right tool for the job is absolutely critical to see what's actually happening.
Ethan: Precisely. And understanding that distinction is key as we move on to our next big idea.
Ava: Okay, so moving on... you mentioned a 'next big idea.' What happens when we can't just find a better measurement tool? What if we can't run a perfect experiment?
Ethan: That's the million-dollar question in so much of social science. And the answer is... we get clever. We look for something called an instrumental variable.
Ava: Instrumental variable. Sounds... technical.
Ethan: It does, but the idea is actually beautiful. Let's start with the problem we're trying to solve. It's called the selection problem.
Ava: Okay...
Ethan: Think about this simple question: do people with college degrees earn more money? If you just compare the two groups, the answer looks like a huge 'yes'.
Ava: Right. Seems obvious.
Ethan: But wait. Are the people who *choose* to go to college the same as those who don't? They might be more motivated, have more family support... things that also lead to higher earnings.
Ava: Ah, so you can't tell if it's the college degree itself or the person's own drive that's causing the higher salary. The effects are all tangled up.
Ethan: Exactly! The college group has 'self-selected' based on traits we can't see. That's the selection problem in a nutshell.
Ava: So how do we untangle it? This is where your 'instrument' comes in?
Ethan: Yep. We need to find a third variable—the instrument—that pushes some people into college but has nothing to do with their secret motivation or ability. Think of it like a random nudge.
Ava: A random nudge... like a lottery?
Ethan: A lottery is the perfect example! In fact, one of the most famous studies on this used the Vietnam War draft lottery.
Ava: The draft? How does that relate to earnings?
Ethan: Well, whether you got a low lottery number was completely random. It was a pure chance event. But that low number dramatically increased the chance you'd serve in the military.
Ava: Okay, I see the random nudge part.
Ethan: So researchers could compare men with low lottery numbers to men with high numbers. The random number is the 'instrumental variable' because it's not connected to anything else about them.
Ava: So it isolates the effect of serving in the military from all that other background noise.
Ethan: Precisely. The instrument has to affect your outcome *only* through its effect on the thing you're studying—in this case, serving in the army. And the result was that for this group, serving in the military actually led to lower earnings later in life.
Ava: Wow. So the instrument found an effect that was totally hidden before. But here's a question... does this apply to everyone who served?
Ethan: Great question. And the answer is no. This is the crucial part. It only tells us the effect for the specific people who were influenced by the nudge.
Ava: What do you mean?
Ethan: It's the effect for the men who served *only because* they got a low draft number. Economists call these people 'compliers'.
Ava: 'Compliers'. Like they're just following orders from the random number generator.
Ethan: Exactly! It doesn't tell us about the career soldiers who would have enlisted anyway. It gives us a 'Local Average Treatment Effect'—just for that group on the fence.
Ava: The key takeaway then is that an instrumental variable is a powerful workaround. It uses a naturally random event to isolate a true cause-and-effect relationship for a very specific group.
Ethan: You've got it. And finding these clever instruments is a huge part of modern research. Some of them are truly ingenious, which brings us to another famous example involving when you're born...
Ava: Okay, I'm hooked. So how does *when* you're born create one of these natural experiments?
Ethan: Great question. This brings us to our final method, and it’s one of my favorites: Regression Discontinuity Design, or RDD. The core idea is that sometimes, a single arbitrary cutoff point completely changes someone's reality.
Ava: A cutoff point? What do you mean?
Ethan: Think of it this way—the legal drinking age in the U.S. is 21. Are you magically a more responsible person the day you turn 21 compared to the day before?
Ava: Definitely not. I think I was probably less responsible for a few weeks *after* I turned 21.
Ethan: Exactly! But the law treats you completely differently. RDD exploits that sharp change. We can plot a graph where the x-axis, our 'running variable', is age. And the y-axis is our outcome, maybe something like hospital visits.
Ava: And you'd expect to see a smooth line, right? People don't suddenly change overnight.
Ethan: Precisely. The core assumption is that nature doesn't make sudden jumps. But policies do. If we see a sudden jump, a discontinuity, in hospital visits right at the 21st birthday cutoff... we can be pretty confident it's caused by the legal access to alcohol.
Ava: So you're comparing people who are *almost* 21 to people who are *just* 21. They're basically identical twins, except one group can legally buy a drink.
Ethan: You've nailed it. That's the continuity assumption. We assume everything else is smooth across that cutoff. By doing that, we isolate the causal effect of the policy, but only for people right around that specific age.
Ava: Ah, so it’s another 'Local Average Treatment Effect', just like with the instrumental variables.
Ethan: You got it. It's an incredibly clever and transparent way to find cause-and-effect when you have a clear, arbitrary rule creating a cutoff.
Ava: Wow. So from finding a 'randomizer' that influences a choice to finding a sharp 'cliff' created by a rule... these methods are all about finding clever ways to isolate a single cause. It’s like being a detective.
Ethan: It really is. And that’s all the time we have for today. Thanks for joining us on the Studyfi Podcast.
Ava: We hope you leave a little more curious than when you arrived. Until next time, keep asking questions. Goodbye everyone!