-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor CPI #14
Refactor CPI #14
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #14 +/- ##
===========================================
- Coverage 91.79% 76.35% -15.45%
===========================================
Files 44 46 +2
Lines 2926 2398 -528
===========================================
- Hits 2686 1831 -855
- Misses 240 567 +327 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are needed for the added code.
from hidimstat.CPI import CPI | ||
|
||
# %% | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide a title for each part
|
||
|
||
def compute_pval(vim): | ||
mean_vim = np.mean(vim, axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mini docstring welcome
y_train, y_test = y[train_index], y[test_index] | ||
cpi = CPI( | ||
estimator=regressor_list[i], | ||
# covariate_estimator=RidgeCV(alphas=np.logspace(-3, 3, 10)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented lines
hidimstat/CPI.py
Outdated
random_state: int = None, | ||
n_jobs: int = 1 | ||
): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write complete doctrings
@@ -0,0 +1,104 @@ | |||
# %% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please find a better name for the example.
I believe that this example is meant to replace the other one ?
I tried to address the different comments:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx !
Maybe I misunderstand some parts of the code you're adding. We should discuss about it.
@@ -0,0 +1,173 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of having legacy examples ?
hidimstat/CPI.py
Outdated
output_dict["loss_reference"] = loss_reference | ||
output_dict['loss_perm'] = dict() | ||
|
||
def joblib_predict_one_gp(estimator, X, y, j): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def joblib_predict_one_gp(estimator, X, y, j): | |
def _joblib_predict_one_gp(estimator, X, y, j): |
hidimstat/LOCO.py
Outdated
self.list_estimators = [clone(self.estimator) | ||
for _ in range(self.nb_groups)] | ||
|
||
def joblib_fit_one_gp(estimator, X, y, j): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def joblib_fit_one_gp(estimator, X, y, j): | |
def _joblib_fit_one_gp(estimator, X, y, j): |
hidimstat/test/test_LOCO.py
Outdated
@@ -0,0 +1,48 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rename to test_loco.py
hidimstat/test/test_CPI.py
Outdated
@@ -0,0 +1,53 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rename to test_cpi.py
hidimstat/test/test_LOCO.py
Outdated
from hidimstat.LOCO import LOCO | ||
|
||
|
||
def test_LOCO(linear_scenario): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to use snale_case or CamelCase, but not both in the same string. -> test_loco
@@ -0,0 +1,50 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_permutation_importance
from hidimstat.PermutationImportance import PermutationImportance | ||
|
||
|
||
def test_CPI(linear_scenario): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_CPI(linear_scenario): | |
def test_cpi(linear_scenario): |
Also linked to #17 |
Leave me a bit of time, it is a big one... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a great step forward !
I have some suggestions on the API. LMK what you think.
hidimstat/cpi.py
Outdated
the others. | ||
""" | ||
if self.groups is None: | ||
self.nb_groups = X.shape[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.nb_groups = X.shape[1] | |
self.n_groups = X.shape[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And change accordingly nb_groups -> n_groups wherever needed
hidimstat/cpi.py
Outdated
else: | ||
self.nb_groups = len(self.groups) | ||
# create a list of covariate estimators for each group if not provided | ||
if len(self.list_imputation_mod) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like self.list_imputation_mod
Maybe self.list_imputation_models
bzw, the user is not supposed to manipulate these models ? So this one could be an internal variable,
self._list_imputation_models
?
hidimstat/cpi.py
Outdated
clone(self.imputation_model) for _ in range(self.nb_groups) | ||
] | ||
|
||
def joblib_fit_one_gp(estimator, X, y, j): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def joblib_fit_one_gp(estimator, X, y, j): | |
def joblib_fit_one_group(estimator, X, y, j): |
hidimstat/cpi.py
Outdated
list must match the number of covariates. | ||
n_perm: int, default=50 | ||
Number of permutations to perform. | ||
groups: dict, default=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that groups are a data-dependent thing and should be provided at fit time ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point I will then move the self._list_imputation_models
to the .fit
function.
Maybe it could also support clustering methods rather than predefined groups (for a next PR).
hidimstat/cpi.py
Outdated
for m in self.list_imputation_mod: | ||
check_is_fitted(m) | ||
|
||
def joblib_predict_one_gp(imputation_model, X, j): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def joblib_predict_one_gp(imputation_model, X, j): | |
def _joblib_predict_one_gp(imputation_model, X, j): |
@achamma723 your opinion is welcome. |
I tried to address all your comments @bthirion |
Hello @bthirion and @jpaillard, sorry I weren't able to see all the comments in the past week. I'll try this weekend to have a look if you didn't decide to merge yet |
hidimstat/cpi.py
Outdated
provided, it will be cloned for each covariate. Otherwise, a list of | ||
potentially different estimators can be provided, the length of the | ||
list must match the number of covariates. | ||
n_perm: int, default=50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to call it n_permutations?
hidimstat/cpi.py
Outdated
|
||
Returns | ||
------- | ||
output_dict: dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the return parameter for the score function (replicated below).
hidimstat/permutation_importance.py
Outdated
def __init__( | ||
self, | ||
estimator, | ||
n_perm: int = 50, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here, n_permutations?
hidimstat/permutation_importance.py
Outdated
Returns | ||
------- | ||
output_dict: dict | ||
A dictionary containing the following keys: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same goes for the return parameter between the predict and the score functions
Hello @jpaillard and @bthirion, I highlighted minor comments when passing, overall I think it is a great stable step for the future benchmarks. As for the cpi, now it is limited to the idea of reconstructing the variable or group of interest by the mean of the residuals, thus maybe the idea of the sampling (that existed via the Modified RF) is also interesting to push in the next steps? |
Thx for the comments @achamma723. I made the modifications you suggested. Indeed it could be worth adding the sampling from nodes of the RF. Maybe we could tackle this in a following PR ? |
Ready to merge on my side @bthirion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're converging. There are just a few missing docstrings and we should include tests for all lines.
I added the additional tests and docstring @bthirion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The last pending thing is the ValueError("fit must be called before predict") to be tested, but this could be handled in a forthcoming PR
I suggest a refactoring of the CPI functionality. It is inspired by the current implementation of permutation importance in scikit-learn.
It aims at:
I also updated the example using the diabetes dataset.
Linked to issues #12 and #13