-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RATE with low treatment propensities --- target.sample="treated"? #1332
Comments
Hi @robert702, that's an interesting question. Since the AUTOC can be represented as a weighted ATE ((8) in https://arxiv.org/pdf/2111.07966.pdf) I wonder if RATE + Crump et al. (2009)'s subsetting via estimated propensities is reasonable, what do you say @syadlowsky ? You could estimate this with for example the following, computing the AUTOC for units with estimated propensities larger than 0.1: rank_average_treatment_effect(evaluation.forest,
priorities,
subset = evaluation.forest$W.hat > 0.1) |
Thanks Erick, I was thinking on something like calculating the TOC "manually" calculating ATEs using the function average_treatement_effect (target.sample="treated" ) in the test sample, over bins calculated using priorities taken from the training sample. Specifically, the RATE source code has: For the TOC, I was thinking on taking the priorities from the original forest, split them into 100 groups. Then, running cumulative over the groups group, instead of taking the average of the scores, I can calculate average treatment effect on the treated sample with the correction in the average_treatment_effect function, using the option target.sample = "treated", or the "overlap" version. In the aggregate, treatment effects with target.sample="treated" and target.sample="overlap" indeed give very similar results. Does this seem like a reasonable approach to you? Or is there any conceptual missuderstanding? Here is a rough script --- rm(list = ls()) n <- 15000 priority.cate <- 1 * predict(cf.priority, X[-train, ])$predictions centile <- cut(priority.cate, breaks = quantile(priority.cate, probs = seq(0, 1, by = 0.01)), labels = FALSE) summary(centile) prioritygroup<- 101 - centile cf.eval <- causal_forest(X[-train, ], Y[-train], W[-train]) ATE<- as.numeric(average_treatment_effect(cf.eval,target.sample = "treated")[1]) TOC <- numeric(100) for (i in 1:100) { plot(TOC, type = "l", xlab = "Priority group", ylab = "ATE of priority group - ATE", main = "TOC") |
My immediate reaction would be to just do what's posted above, that's one of the reasons I added the |
Thanks Erick. I imagine that could work when there are enough observations with propensities above 0.1. As I was saying earlier, In my setting, the mass of propensities is at 0.02, so the simple subseting you proposed would not work. I could just take a random sample of the control group to have a more balanced design, or use the suggestion described in the documentation in the average_treatment_effect function, as I described in my previous post: target_group("treated") or target_group("overlap"). Any thoughts on which of the two would be more appropriate? Or alternative approaches when there are basically no obaservations with propensities> 0.1? Thanks in advance!! |
I am using causal_forest for an RCT were the treatment group has a very low treatment propensity: N control is 1 million, N treatment is 20,000.
When I calculate average_treatment_effect I get a warning that I should use the option target.sample="treated". This number is indeed much different from the overall average_treatment_effect (despite randomization) and it is also closer to what I get using OLS, which makes sense.
I now want to use RATE to evaluate the presence of heterogeneity. I wonder if I should be making any adjustment to account for the low treatment propensities. If there is no pre-loaded option, I could go to the source code myself, but any guidance on whether something like this is needed or not, would be greatly appreciated.
Thanks.
The text was updated successfully, but these errors were encountered: