Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event study in Stata #145

Open
pdeffebach opened this issue Jun 21, 2021 · 9 comments
Open

Event study in Stata #145

pdeffebach opened this issue Jun 21, 2021 · 9 comments

Comments

@pdeffebach
Copy link

First of all, thank you for this wonderful resource!

I am confused by the Stata event study code, and think it might not be totally correct. For reference, here it is

use "https://raw.githubusercontent.com/LOST-STATS/LOST-STATS.github.io/master/Model_Estimation/Data/Event_Study_DiD/bacon_example.dta", clear

* create the lag/lead for treated states
* fill in control obs with 0
* This allows for the interaction between `treat` and `time_to_treat` to occur for each state.
* Otherwise, there may be some NAs and the estimations will be off.
g time_to_treat = year - _nfd
replace time_to_treat = 0 if missing(_nfd)
* this will determine the difference
* btw controls and treated states
g treat = !missing(_nfd)

* Stata won't allow factors with negative values, so let's shift
* time-to-treat to start at 0, keeping track of where the true -1 is
summ time_to_treat
g shifted_ttt = time_to_treat - r(min)
summ shifted_ttt if time_to_treat == -1
local true_neg1 = r(mean)

* Regress on our interaction terms with FEs for group and year,
* clustering at the group (state) level
* use ib# to specify our reference group
reghdfe asmrs ib`true_neg1'.shifted_ttt pcinc asmrh cases, a(stfips year) vce(cluster stfips)

My problem stems from the line

replace time_to_treat = 0 if missing(_nfd)

This means that states which are not treated are given 0, meaning they are treated in that year. This gives the following


time_to_tre	
at	Freq.	Percent	Cum.
			
-21	1	0.06	0.06
-20	2	0.12	0.19
-19	2	0.12	0.31
-18	2	0.12	0.43
-17	2	0.12	0.56
-16	3	0.19	0.74
-15	3	0.19	0.93
-14	3	0.19	1.11
-13	6	0.37	1.48
-12	7	0.43	1.92
-11	9	0.56	2.47
-10	12	0.74	3.22
-9	22	1.36	4.58
-8	25	1.55	6.12
-7	32	1.98	8.10
-6	34	2.10	10.20
-5	36	2.23	12.43
-4	36	2.23	14.66
-3	36	2.23	16.88
-2	36	2.23	19.11
-1	36	2.23	21.34
0	465	28.76	50.09
1	36	2.23	52.32
2	36	2.23	54.55
3	36	2.23	56.77
4	36	2.23	59.00
5	36	2.23	61.22
6	36	2.23	63.45
7	36	2.23	65.68
8	36	2.23	67.90
9	36	2.23	70.13
10	36	2.23	72.36
11	36	2.23	74.58
12	35	2.16	76.75
13	34	2.10	78.85
14	34	2.10	80.95
15	34	2.10	83.06
16	34	2.10	85.16
17	33	2.04	87.20
18	33	2.04	89.24
19	33	2.04	91.28
20	30	1.86	93.14
21	29	1.79	94.93
22	27	1.67	96.60
23	24	1.48	98.08
24	14	0.87	98.95
25	11	0.68	99.63
26	4	0.25	99.88
27	2	0.12	100.00
			
Total	1,617	100.00

It's possible that because in control units, time_to_treat does not vary across years, the state (stfips) fixed effects "take care" of this. But I can't intuitively reason about what's really happening given 0 stands for both untreated and treated, but year 0.

I would recommend making the time_to_treat variable 100 or the maximum plus 100, to avoid this confusion. The values don't matter since they are used as fixed effects anyways.

@grantmcdermott
Copy link
Contributor

I think you're correct. It doesn't make sense to have time_to_treatment==0 represent two different states. (Schrödringer's zero?) Now, from a practical perspective, I don't think it matters since the treat and ib#shited_ttt variables will cancel things it. But I agree it's logically confusing.

Taking a higher-level look at this event study page, there are several other things we could change. We shouldn't be using a proprietary data format (dta) for one thing, and the R code as currently written won't work with the impending fixest 0.9.0 release. (It could also be considerably shortened.) I'll make some changes and then let someone else pitch in about the Stata code.

@grantmcdermott
Copy link
Contributor

I've pushed a couple of updates. But really I think this page needs a closer look. I just ran the Stata implementation of Sun Abraham and got different results to the R version... I don't have time to troubleshoot now and it might just be a copy-paste error on my side. But I'm hoping @pdeffebach or @NickCH-K can double check.

Similarly, I tried the Callaway-Sant'Anna estimator and got different results once again. I spoke v. briefly to Pedro and he thinks it's likely due to covariates, which SA don't handle in their paper. Again, though, it could be user error on my part. Here's the code I was trying locally:

library(did)

## Change never treated to zero
dat[, year_treated := ifelse(treat==0, 0, `_nfd`)]

mod_cs = att_gt(yname = "asmrs",
gname = "year_treated",
idname = "stfips",
tname = "year",
xformla = ~ pcinc + asmrh + cases,
data = dat)

summary(mod_cs)

cs = aggte(mod_cs, type = "dynamic", na.rm = TRUE)
cs

ggdid(cs)

@NickCH-K
Copy link
Contributor

I'll look at this when I get a chance

@NickCH-K
Copy link
Contributor

Shifting the untreated treated-time to a big positive number seems fine to me; I do like the logic that the zeroes there let things drop out.

On the R side the sunab() function looks neat! It's a bit of a concern it doesn't match Stata, as the Stata package is by Sun herself.

@aalamer1
Copy link

aalamer1 commented Nov 1, 2021

Yes. You have to assign to very positive number that is not included in the wanted interval. Keeping this as it is will apply the regression for states that treat=0 while we just want to include states with treat=1. I tried it and got different results.

@NickCH-K
Copy link
Contributor

NickCH-K commented Nov 7, 2021

@aalamer1 do you have a fixed version of the code?

@aalamer1
Copy link

aalamer1 commented Nov 9, 2021

@NickCH-K I am working on my own dataset but modified replace time_to_treat = 0 if missing(_nfd)
to be replace time_to_treat = 10000 if missing(_nfd).

Did you try in R code they posted? I tried to replicate it but generate errors of collinearity it is not working. I do not know how did they generate the graph and confirmed the results?

@Swarup8289
Copy link

Hi all: In the code:
reghdfe asmrs ib`true_neg1'.shifted_ttt pcinc asmrh cases, a(stfips year) vce(cluster stfips)

Shouldnt there be the interaction term of " treat## ibtrue_neg1'.shifted_ttt" instead of just "ibtrue_neg1'.shifted_ttt" ?

Thanks in advance!

@NickCH-K
Copy link
Contributor

Yes, I believe you're right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants