Questions on implementation of econML DML #1289

abhilasha-workday · 2024-12-03T18:17:05Z

Hello,

I am currently using DoWhy library for some causal analysis where my treatment is continuous and my outcome is binary. For estimating effect I used tow methods: Logistic regression through GLM and DML. Please find code snippets for their implementation:

Logistic Regression
import statsmodels.api as sm
estimate = model.estimate_effect(est_ident,
method_name="backdoor.generalized_linear_model",
test_significance=True,
method_params = {
'num_null_simulations':20,
'num_simulations':20,
'num_quantiles_to_discretize_cont_cols':10,
'fit_method': "statsmodels",
'glm_family': sm.families.Binomial(), # logistic regression
'need_conditional_estimates':False
},
control_value= 0.2,
treatment_value= 0.3
)
print(estimate)

DML
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
from sklearn.linear_model import LassoCV
from sklearn.preprocessing import PolynomialFeatures

dml_estimate = model.estimate_effect(est_ident, method_name="backdoor.econml.dml.DML",
control_value = 0.1,
treatment_value = 0.2,
confidence_intervals=False,
method_params={"init_params":{'model_y':GradientBoostingClassifier(random_state = 101),
'model_t': GradientBoostingRegressor(random_state = 101),
"model_final":LassoCV(random_state = 101),
'featurizer':PolynomialFeatures(degree=1, include_bias=True),
'discrete_treatment': False,
'random_state': 101
},
"fit_params":{}})
print(dml_estimate)

Could you elaborate on how to interpret ATE for both of these methods? For ex: If my ATE is -0.02, is it okay to say that 'Increasing treatment from 0.1 to 0.2, leads to 3% decrease in the outcome'?
Is the ATE returned by Logistic regression model actually the coefficient of the treatment variable of the model? Also, when I compare the coefficient returned by estimate.estimator.model for GLM estimator with the mean estimate returned by estimate_effect(), they are drastically different. Is that expected behavior?
I wanted to create a sort of a dose-response curve to see the effect of change of treatment on my outcome. For ex- how my outcome changes when I increase by treatment from 0.1-0.2, 02-0.3, 0.3-0.4 etc. In order to accomplish this, I changed the values for my control_value and treatment_value parameters in estimate_effect(). When I do so, I get different ATE with logistic regression but same ATE with DML. Why is that?
How does DoWhy work in the back to calculate ATE when the control_value and the treatment_value is provided? Specially interested in the GLM and DML effect estimation methods.

Looking forward to hearing back on this. TIA!

abhilasha-workday added the question Further information is requested label Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions on implementation of econML DML #1289

Questions on implementation of econML DML #1289

abhilasha-workday commented Dec 3, 2024

Questions on implementation of econML DML #1289

Questions on implementation of econML DML #1289

Comments

abhilasha-workday commented Dec 3, 2024