Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with data-stream of constant values during a certain period #80

Open
shfa5275 opened this issue Nov 8, 2020 · 1 comment
Open

Comments

@shfa5275
Copy link

shfa5275 commented Nov 8, 2020

In certain cases, a stream may continue to get constant values for a while. Sometimes, in this case xmin=xmax resulting in l=nan, thereby leading to an exception in the following code:

def _cut(self, X, S, parent=None, side='l'):
# Find max and min over all d dimensions
xmax = X[S].max(axis=0)
xmin = X[S].min(axis=0)

    # Compute l
    l = xmax - xmin
    l /= l.sum()

Any suggestions to deal with this "special case" gracefully!

@mdbartos
Copy link
Member

mdbartos commented Nov 8, 2020

I do not think the algorithm is well-defined for the case where all points are exactly identical, because you cannot partition the point set.

https://klabum.github.io/rrcf/tree-construction.html

In this case, you would essentially skip the tree construction algorithm and create a root node that is also a leaf that contains all the points in the set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants