BBI: Should the covariate estimator be fitted on the training or test set ? #12

jpaillard · 2024-08-23T09:08:09Z

The covariate estimator $\hat{\mathcal{E}}[X^j|X^{-j}]$ is currently fitted and used for predicting on the test set:

hidimstat/hidimstat/compute_importance.py

Lines 251 to 265 in 7c3af80

    
           importance_models["regression"] = hypertune_predictor( 
        
               importance_models["regression"], 
        
               X_test_minus_idx[cur_output_ind, ...], 
        
               output["regression"][cur_output_ind, ...], 
        
               param_grid={"max_depth": [2, 5, 10]}, 
        
           ) 
        
           importance_models["regression"].fit( 
        
               X_test_minus_idx[cur_output_ind, ...], 
        
               output["regression"][cur_output_ind, ...], 
        
           ) 
        
           X_col_pred["regression"][counter_test].append( 
        
               importance_models["regression"].predict( 
        
                   X_test_minus_idx[cur_output_ind, ...] 
        
               ) 
        
           )

This could lead to overfitting on the test set and underestimating the conditional importance of covariate $j$.
Another solution suggested by Ahmad would be to cross-fit on the test set. This could on the contrary lead to poorer fit of the covariate estimator compared to methods like LOCO that leverage the entire train set for accounting for correlations.

Also, I find the naming importance_estimator a bit misleading. It reminds me of the method used for importance estimation (LOCO, SHAP, CPI...) but actually corresponds to the estimator used in CPI to predict $X^j$ from $X^{-j}$

The text was updated successfully, but these errors were encountered:

bthirion · 2024-08-23T19:40:59Z

Thx. This should be addressed in 2 PRs.

jpaillard · 2024-10-15T19:25:45Z

Closing since it has been addressed in #14

jpaillard mentioned this issue Sep 18, 2024

Refactor CPI #14

Merged

jpaillard closed this as completed Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BBI: Should the covariate estimator be fitted on the training or test set ? #12

BBI: Should the covariate estimator be fitted on the training or test set ? #12

jpaillard commented Aug 23, 2024

bthirion commented Aug 23, 2024

jpaillard commented Oct 15, 2024

BBI: Should the covariate estimator be fitted on the training or test set ? #12

BBI: Should the covariate estimator be fitted on the training or test set ? #12

Comments

jpaillard commented Aug 23, 2024

bthirion commented Aug 23, 2024

jpaillard commented Oct 15, 2024