-
Let's say I have the following set up:
The functional form is:
I would like to use double ML to do the following:
The intuition of this approach is to (i) net off the confounders X from Y, T1, T2 and T3, (ii) take the residuals and (iii) estimate the impact of the treatment variables on Y. Is this something that can be implemented with doubleML? If so, how? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Some details on the multiple treatment case can be found in the user guide: https://docs.doubleml.org/stable/guide/sim_inf.html. This should then be applicable to your functional form. Note however, that it is not explicitly imposing the form import doubleml as dml
import numpy as np
from sklearn.base import clone
from sklearn.linear_model import LassoCV, LinearRegression
np.random.seed(1234)
n_obs = 1000
dim_x = 100
X = np.random.normal(size=(n_obs, dim_x))
theta = np.array([1., 1.5, 2.25])
beta = [1 / (k**2) for k in range(1, dim_x + 1)]
gamma = [1 / (k**2) for k in range(1, dim_x + 1)]
T1 = np.dot(X, gamma) + np.random.normal(size=(n_obs,))
T2 = np.dot(X, gamma) + np.random.normal(size=(n_obs,))
T3 = np.dot(X, gamma) + np.random.normal(size=(n_obs,))
T = np.vstack([T1, T2, T3]).T
y = np.dot(T, theta) + np.dot(X, beta) + np.random.standard_normal(size=(n_obs,))
dml_data = dml.DoubleMLData.from_arrays(X, y, T)
learner = LassoCV()
ml_g = clone(learner)
ml_m = clone(learner)
dml_plr = dml.DoubleMLPLR(dml_data, ml_g, ml_m)
print(dml_plr.fit().bootstrap().confint(joint=True))
print(dml_plr.p_adjust())
print(dml_plr.p_adjust(method='bonferroni'))
The approach you suggested, is not directly implemented in DoubleML. I haven't checked the theoretical details. However, there is the option to export predictions and with that I assume that you can do the final stage (in the style suggested by you) by hand with the following code: dml_data = dml.DoubleMLData.from_arrays(X, y, T, use_other_treat_as_covariate=False)
learner = LassoCV()
ml_g = clone(learner)
ml_m = clone(learner)
dml_plr = dml.DoubleMLPLR(dml_data, ml_g, ml_m)
dml_plr.fit(store_predictions=True)
g_hat = dml_plr.predictions['ml_g'][:,0,0]
m_hat = dml_plr.predictions['ml_m'][:,0,:]
u_hat = y - g_hat
v_hat = T - m_hat
reg = LinearRegression(fit_intercept=False).fit(v_hat, u_hat)
reg.coef_
|
Beta Was this translation helpful? Give feedback.
-
Thank you, MalteKurz. This is very helpful.
I understand this means that your first code snippet estimates each theta individually, i.e. using 3 separate OLS regressions (u_hat against v1_hat; u_hat against v2_hat; u_hat against v3_hat). In your second code snippet, we can impose the linear functional. However, I'm not sure about the validity of each coefficient standard errors and confidence interval in the regression summary - any suggestions welcomed. In summary, the theta estimated using the first code snippet might be different compared to the second snipped since in the first approach we are running a regression without 'controlling' for other residuals. Let me know if any of the point above is not correct. Appreciate your help. |
Beta Was this translation helpful? Give feedback.
Some details on the multiple treatment case can be found in the user guide: https://docs.doubleml.org/stable/guide/sim_inf.html. This should then be applicable to your functional form. Note however, that it is not explicitly imposing the form
Y = theta_1 * T1 + theta_2 * T2 + theta_3 * T3 + g(X)
as you intended. The code to estimate joint confidence intervals for the effect of (T1, T2, T3) on Y as described in the user guide is given below: