-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inference on new data samples #70
Comments
That's probably the most flexible way to do it (create forest from point set S using batch mode -> insert new point x into each tree -> compute codisp -> delete point x). But yes, it will probably be slow. Parallelizing can help though. I'm not sure if this is helpful, but note that the insert_point algorithm is guaranteed to produce a tree drawn from RRCF(S \union x), where S is a point set, and x is an additional point. In other words, the following two trees are statistically indistinguishable, and the codisp of x will be the same in expectation:
|
Thanks for your hints. Yes, it indeed a little bit slow without parallelizing. Do you know which step above consume most of time and its time complexity? I guess maybe the insert point step? |
Yeah, I would say insert_point is the slowest step. I have time breakdowns here: |
Hi, Thanks for implementing this. I have a use case that need train the rrcf using some given dataset, and then predict on unseen data samples this haven't been seen during training period.
Can we use batch mode to achieve that? One simple solution I can come up with is to first insert that point to the forest, calculate the codisp, and then delete it. I am wondering is there any smarter ways to save the inference time?
Thanks.
The text was updated successfully, but these errors were encountered: