Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on implementation of econML DML #1289

Open
abhilasha-workday opened this issue Dec 3, 2024 · 0 comments
Open

Questions on implementation of econML DML #1289

abhilasha-workday opened this issue Dec 3, 2024 · 0 comments
Labels
question Further information is requested

Comments

@abhilasha-workday
Copy link

Hello,

I am currently using DoWhy library for some causal analysis where my treatment is continuous and my outcome is binary. For estimating effect I used tow methods: Logistic regression through GLM and DML. Please find code snippets for their implementation:

Logistic Regression
import statsmodels.api as sm
estimate = model.estimate_effect(est_ident,
method_name="backdoor.generalized_linear_model",
test_significance=True,
method_params = {
'num_null_simulations':20,
'num_simulations':20,
'num_quantiles_to_discretize_cont_cols':10,
'fit_method': "statsmodels",
'glm_family': sm.families.Binomial(), # logistic regression
'need_conditional_estimates':False
},
control_value= 0.2,
treatment_value= 0.3
)
print(estimate)

DML
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor
from sklearn.linear_model import LassoCV
from sklearn.preprocessing import PolynomialFeatures

dml_estimate = model.estimate_effect(est_ident, method_name="backdoor.econml.dml.DML",
control_value = 0.1,
treatment_value = 0.2,
confidence_intervals=False,
method_params={"init_params":{'model_y':GradientBoostingClassifier(random_state = 101),
'model_t': GradientBoostingRegressor(random_state = 101),
"model_final":LassoCV(random_state = 101),
'featurizer':PolynomialFeatures(degree=1, include_bias=True),
'discrete_treatment': False,
'random_state': 101
},
"fit_params":{}})
print(dml_estimate)

  1. Could you elaborate on how to interpret ATE for both of these methods? For ex: If my ATE is -0.02, is it okay to say that 'Increasing treatment from 0.1 to 0.2, leads to 3% decrease in the outcome'?
  2. Is the ATE returned by Logistic regression model actually the coefficient of the treatment variable of the model? Also, when I compare the coefficient returned by estimate.estimator.model for GLM estimator with the mean estimate returned by estimate_effect(), they are drastically different. Is that expected behavior?
  3. I wanted to create a sort of a dose-response curve to see the effect of change of treatment on my outcome. For ex- how my outcome changes when I increase by treatment from 0.1-0.2, 02-0.3, 0.3-0.4 etc. In order to accomplish this, I changed the values for my control_value and treatment_value parameters in estimate_effect(). When I do so, I get different ATE with logistic regression but same ATE with DML. Why is that?
  4. How does DoWhy work in the back to calculate ATE when the control_value and the treatment_value is provided? Specially interested in the GLM and DML effect estimation methods.

Looking forward to hearing back on this. TIA!

@abhilasha-workday abhilasha-workday added the question Further information is requested label Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant