inference on new data samples #70

sophark · 2019-12-16T22:37:42Z

Hi, Thanks for implementing this. I have a use case that need train the rrcf using some given dataset, and then predict on unseen data samples this haven't been seen during training period.

Can we use batch mode to achieve that? One simple solution I can come up with is to first insert that point to the forest, calculate the codisp, and then delete it. I am wondering is there any smarter ways to save the inference time?

Thanks.

mdbartos · 2019-12-17T03:07:12Z

That's probably the most flexible way to do it (create forest from point set S using batch mode -> insert new point x into each tree -> compute codisp -> delete point x). But yes, it will probably be slow. Parallelizing can help though.

I'm not sure if this is helpful, but note that the insert_point algorithm is guaranteed to produce a tree drawn from RRCF(S \union x), where S is a point set, and x is an additional point.

In other words, the following two trees are statistically indistinguishable, and the codisp of x will be the same in expectation:

Create tree T' from point set (S \union x) via batch mode.
Create tree T from point set S via batch mode and then insert x, resulting in tree T'.

sophark · 2019-12-17T19:48:14Z

That's probably the most flexible way to do it (create forest from point set S using batch mode -> insert new point x into each tree -> compute codisp -> delete point x). But yes, it will probably be slow. Parallelizing can help though.

Thanks for your hints. Yes, it indeed a little bit slow without parallelizing. Do you know which step above consume most of time and its time complexity? I guess maybe the insert point step?

mdbartos · 2019-12-18T02:50:57Z

Yeah, I would say insert_point is the slowest step. I have time breakdowns here:
#28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference on new data samples #70

inference on new data samples #70

sophark commented Dec 16, 2019

mdbartos commented Dec 17, 2019 •

edited

Loading

sophark commented Dec 17, 2019

mdbartos commented Dec 18, 2019

inference on new data samples #70

inference on new data samples #70

Comments

sophark commented Dec 16, 2019

mdbartos commented Dec 17, 2019 • edited Loading

sophark commented Dec 17, 2019

mdbartos commented Dec 18, 2019

mdbartos commented Dec 17, 2019 •

edited

Loading