-
-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add numerical perturbation detector #2040
base: main
Are you sure you want to change the base?
Conversation
@kevinmessiaen Please review |
Hey @henchaves - Would you have some bandwidth to review this PR ? |
@henchaves Did you get a chance to look at this? Do I need to make any changes? |
Hello @Kranium2002, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Kranium2002, thanks so much for the contribution! I think numerical perturbation is a great addition to the library.
I've noted a few issues with the proposed implementation, in summary:
- We should check for features not only based on their data type, but column type (sometimes we have categorical features represented as numbers, but we don't want to apply the numerical perturbation to those)
- The perturbation functions used should be defined as instances of
TranformationFunction
(the same we use for text perturbation or other transformations) - We need to be careful not to alter the underlying data type. Certain transformations should be applied depending on whether the feature has integer or float values (e.g. we don't want to add Gaussian noise to integers)
I also feel there is a lot of redundancy between the BaseNumericalPerturbationDetector
calss and BaseTextPerturbationDetector
. It's probably worth refactoring this to have a common base class covering the shared behavior instead of duplicating code — @henchaves can help with that ;)
lambda x: x * 1.01, | ||
lambda x: x * 0.99, | ||
lambda x: x + np.random.normal(0, 0.01, x.shape), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be proper transformation functions (extending TransformationFunction
).
IMPORTANT: all of these perturbations assume that the numerical values are floats. This is not always the case, integer features are common: when such data types are present we must not alter that. Applying the transformations above (for example Gaussian noise) to an integer feature like num_passengers
would silently convert the data type to float, potentially breaking the model inference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how do you suggest I check perturbation for integer data?
Thank you for your review @mattbit I will work on this on the weekend. |
Co-authored-by: Matteo <[email protected]>
Co-authored-by: Matteo <[email protected]>
Co-authored-by: Matteo <[email protected]>
Co-authored-by: Matteo <[email protected]>
@mattbit Any updates on this PR? |
Hey @Kranium2002, I'll be reviewing your PR this week. Sorry for the delay! |
Description
This PR adds the numerical perturbation detector feature to robustness detector. This PR is still a work in progress.
The detector will perturb the values by 1 percent and then check for errors in classification mode and for regression the model flags any error which is 5% away from the real value. This could be made so that the user can adjust these percentages but for now this is fixed. Review needed on this.
Related Issue
closes #1846
Type of Change
Checklist
CODE_OF_CONDUCT.md
document.CONTRIBUTING.md
guide.pdm.lock
runningpdm update-lock
(only applicable whenpyproject.toml
has beenmodified)