-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] add test implementation of carrillo and rosenbaum #208
base: main
Are you sure you want to change the base?
Conversation
) | ||
# 5 | ||
self.actual_ = y | ||
self.counterfactual_ = self.tau_ * self.actual_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not yet certain that this does the "right" thing. C&R say you need to apply tau against P(y1,y2|tau), and the empirical distribution of that is actual_
(y
). But, I'm not sure where the kernel density function re-weighting needs to come in? Need to continue working on it.
hm, maybe the opposite, actually. I think of esda as the second-most central layer in the pysal stack so it might be preferable to move some of the counterfactual generators from segregation over here instead? I rewrote them all for parallelization maybe a year ago, so they all live in one spot. That would make them easy to port into esda if they're useful elsewhere in the ecosystem? |
Hi @ljwolf and @knaaptime , long time no see! I remember this was one of the first thigs I developed when I arrived in the CGS, but developed an R framework with plotly implementation of carrillo and rosenbaum (2016). The main hint was that the binary dependent model for the propensity score matching should "separate" well the groups, that is why the logistic regression used has some non-linear terms (the authors explained this to me through e-mail). Will look here my historical data and share with you, ok? Did you receive the e-mail with the files attached (matlab and R codes) @ljwolf and @knaaptime ? |
So, technically, I believe they are quite different since their approach relies on matching using covariates, whereas our approach is not modeled with covariates... |
Yes! good to see you (virtually) 😄
Yes! Thank you very much, @renanxcortes!! That's super helpful.
Yes, definitely this makes sense, because the "power" of the method is based on that odds ratio weight, tau. In theory (and this implementation), you could use any estimator that provides predicted probabilities for observation i to be in time t, given its traits x_i (like, the method could use trees, XGBoost, anns, etc)
I know it looks this way on the surface. But, the C&R approach can just "ignore" X and still create tau and re-weight... so I wonder whether there may be a way to relate them formally using the "pooled" ecdf in the case of no exogenous information. Regardless I'll adapt & extend the R script you sent along to correct this test implementation! And, I agree w/ @knaaptime that it makes sense to have them in the same place if there's more than one... and I agree esda makes sense, but I really don't mind wherever these get put! |
Awesome! |
This is a test of the carrillo and rosenbaum (2016) counterfactual spatial distribution estimator. I'm not sure of the precise mathematical relationship between this and Cortes et al. (2021)'s cdf estimator...
since this is general-purpose resampling, should it live in esda? I suppose that the cdf counterfactualizer is similarly generic... @renanxcortes @knaaptime @sjsrey would you rather this kind of thing live in segregation, too?