Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BBI: Should the covariate estimator be fitted on the training or test set ? #12

Closed
jpaillard opened this issue Aug 23, 2024 · 2 comments

Comments

@jpaillard
Copy link
Collaborator

The covariate estimator $\hat{\mathcal{E}}[X^j|X^{-j}]$ is currently fitted and used for predicting on the test set:

importance_models["regression"] = hypertune_predictor(
importance_models["regression"],
X_test_minus_idx[cur_output_ind, ...],
output["regression"][cur_output_ind, ...],
param_grid={"max_depth": [2, 5, 10]},
)
importance_models["regression"].fit(
X_test_minus_idx[cur_output_ind, ...],
output["regression"][cur_output_ind, ...],
)
X_col_pred["regression"][counter_test].append(
importance_models["regression"].predict(
X_test_minus_idx[cur_output_ind, ...]
)
)

This could lead to overfitting on the test set and underestimating the conditional importance of covariate $j$.
Another solution suggested by Ahmad would be to cross-fit on the test set. This could on the contrary lead to poorer fit of the covariate estimator compared to methods like LOCO that leverage the entire train set for accounting for correlations.

Also, I find the naming importance_estimator a bit misleading. It reminds me of the method used for importance estimation (LOCO, SHAP, CPI...) but actually corresponds to the estimator used in CPI to predict $X^j$ from $X^{-j}$

@bthirion
Copy link
Contributor

Thx. This should be addressed in 2 PRs.

@jpaillard jpaillard mentioned this issue Sep 18, 2024
@jpaillard
Copy link
Collaborator Author

Closing since it has been addressed in #14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants