Skip to content

Handling outliers with Online Topic Modeling with River #948

Answered by MaartenGr
vantubbe asked this question in Q&A
Discussion options

You must be logged in to vote

The difficulty here is with the definition of an outlier. In the context of online clustering, what may be an outlier at timestep t might not be at timestep t+10 when it has seen much more data. Vice versa is also possible. I do believe there are algorithms in River that take outliers into account, like DenStream, but I have not used it myself yet. There might need to be some tweaking necessary there in order to generate the -1 clusters.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@vantubbe
Comment options

Answer selected by vantubbe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants