QUESTION: Simulating sampling of points in streaming detection #91

stianvale · 2021-06-16T07:56:46Z

Hi!
I've tested both your implementation of 'streaming detection' and 'batch detection'. So far, I'm getting the best results with the 'batch detection'. However, I want to use the streaming approach to dynamically update the model according to a continuous stream of data.

My current understanding is that 'batch detection' performs better because of the random sampling of points. With 'streaming detection', all trees contain the same points. Therefore, I tested an approach where some points are randomly deleted from trees after calculating the codisp. That way, the trees will contain different points, which in way simulates random sampling of points. My current results tells me that this works well.

Does this sound like a valid alternative to the standard 'streaming detection', or are there some traps I'm missing here?

mdbartos · 2021-06-17T20:25:13Z

Greetings,

The method for sampling included in the README was chosen for demonstration purposes---the implementation is short and easy to read. It's definitely not the only way to do sampling, and different sampling methods are encouraged.

The original RRCF paper proposes 'reservoir sampling', which would correspond to uniform sampling in time for the batch mode case. (See: https://en.wikipedia.org/wiki/Reservoir_sampling)

Ultimately the choice of sampling method will depend on the user's needs---namely, how far back in time do you want to algorithm to 'remember'.

MDB

stianvale · 2021-06-21T09:22:59Z

Thanks for your response, @mdbartos !

Cool, yeah, I see that's the default sampling technique of Sagemaker's RRCF as well. I'll test out reservoir sampling then. Have you implemented it for this repo before? In that case, maybe you could share the code?

stianvale changed the title ~~QUESTION: Simulating sampling of points in streaming algorithm~~ QUESTION: Simulating sampling of points in streaming detection Jun 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QUESTION: Simulating sampling of points in streaming detection #91

QUESTION: Simulating sampling of points in streaming detection #91

stianvale commented Jun 16, 2021

mdbartos commented Jun 17, 2021

stianvale commented Jun 21, 2021

QUESTION: Simulating sampling of points in streaming detection #91

QUESTION: Simulating sampling of points in streaming detection #91

Comments

stianvale commented Jun 16, 2021

mdbartos commented Jun 17, 2021

stianvale commented Jun 21, 2021