Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge is invalid #15

Open
kurtbrose opened this issue Mar 11, 2015 · 0 comments
Open

merge is invalid #15

kurtbrose opened this issue Mar 11, 2015 · 0 comments

Comments

@kurtbrose
Copy link

The merge operation is not valid for this algorithm. It is not a distributed algorithm, and intermediate results from two streams cannot be combined.

The definition of g from the paper, Delta in the go code: "gi is the difference between the lowest possible rank of item i and the lowest possible rank of item i − 1"

Consider two sequences, with stored nodes marked with ():

...(1)111(3)...

...0(2)...

The node with value=3 will have g/Delta of 4. After the lists are merged:

...0(1)111(2)(3)...

The node with value=3 still has g/Delta of 4, but this is incorrect. The actual g/Delta should be 1. The invariant of the algorithm is broken, and therefore the guaranteed error bounds are also broken.

If there are no longer guaranteed error bounds, you might as well use a reservoir sample.

The fix is to remove the merge() operation. (Multiple goroutines would need to either push points into a queue, or use a lock around the data structure, or some other fancy architecture.)

Q-digest is an example of a distributed algorithm in this space:
http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf

Edit: the original authors of the paper also extended it to be a distributed algorithm.
http://www.cis.upenn.edu/~mbgreen/papers/pods04.pdf
The data structure used is a bit different, but similar. This would require a total re-write of the core algorithm, but would allow for a valid merge() operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant