You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
I've tested both your implementation of 'streaming detection' and 'batch detection'. So far, I'm getting the best results with the 'batch detection'. However, I want to use the streaming approach to dynamically update the model according to a continuous stream of data.
My current understanding is that 'batch detection' performs better because of the random sampling of points. With 'streaming detection', all trees contain the same points. Therefore, I tested an approach where some points are randomly deleted from trees after calculating the codisp. That way, the trees will contain different points, which in way simulates random sampling of points. My current results tells me that this works well.
Does this sound like a valid alternative to the standard 'streaming detection', or are there some traps I'm missing here?
The text was updated successfully, but these errors were encountered:
stianvale
changed the title
QUESTION: Simulating sampling of points in streaming algorithm
QUESTION: Simulating sampling of points in streaming detection
Jun 16, 2021
The method for sampling included in the README was chosen for demonstration purposes---the implementation is short and easy to read. It's definitely not the only way to do sampling, and different sampling methods are encouraged.
Cool, yeah, I see that's the default sampling technique of Sagemaker's RRCF as well. I'll test out reservoir sampling then. Have you implemented it for this repo before? In that case, maybe you could share the code?
Hi!
I've tested both your implementation of 'streaming detection' and 'batch detection'. So far, I'm getting the best results with the 'batch detection'. However, I want to use the streaming approach to dynamically update the model according to a continuous stream of data.
My current understanding is that 'batch detection' performs better because of the random sampling of points. With 'streaming detection', all trees contain the same points. Therefore, I tested an approach where some points are randomly deleted from trees after calculating the codisp. That way, the trees will contain different points, which in way simulates random sampling of points. My current results tells me that this works well.
Does this sound like a valid alternative to the standard 'streaming detection', or are there some traps I'm missing here?
The text was updated successfully, but these errors were encountered: