Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUESTION: Feature importance #90

Open
stianvale opened this issue Jun 7, 2021 · 3 comments
Open

QUESTION: Feature importance #90

stianvale opened this issue Jun 7, 2021 · 3 comments

Comments

@stianvale
Copy link

Hi, and thanks building this great repo!

I have a general question; what's the proper way to compute feature importance for RRCF?
Basically, I want to know what features contribute the most to the collusive displacement value.

@mdbartos
Copy link
Member

To clarify, do you mean: for a set of multidimensional points, which dimension contributes the most to the total codisp over all points in the dataset?

These three pages of the docs may be useful:
Tree construction: https://klabum.github.io/rrcf/tree-construction.html
Anomaly scoring: https://klabum.github.io/rrcf/anomaly-scoring.html
Caveats: https://klabum.github.io/rrcf/caveats.html

Perhaps it would be helpful to specify a (mathematical) definition of feature importance for your problem of interest. Or perhaps you can describe the particular problem you are trying to solve.

@stianvale
Copy link
Author

Thanks for your reply @mdbartos !

Yeah, what I'm asking for is: For a given multidimensional point, which dimensions contribute the most to that point's codisp.

I have a draft approach on this, that just compares the point's dimension values with the mean dimension values of all points. In this way, we can see what dimensions are differing the most from 'normal' behavior. But that is just a temporary proxy for feature importance.

So what I'm asking is if there is some way to deduct the feature importances of a point from the formula of codisp.

Does that make sense?

@stianvale
Copy link
Author

Hi again! @mdbartos, have you ever experimented with computing the feature importance of a particular point?
I think this would be a great addition to the current library in terms of improving the explainability of the anomalies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants