-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] fused_l2_nn_argmin produces unexpected results #1036
Comments
cc @Nyrio have you encountered this behavior in your dealings with the fused l2 primitive? |
@benfred it looks like it might be related to the >>> X = cp.random.random((100, 5)).astype(cp.float32)
>>> ids = fused_l2_nn_argmin(X, X, sqrt=False).copy_to_host()
>>> ids
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99],
dtype=int32)
>>> ids = fused_l2_nn_argmin(X, X, sqrt=True).copy_to_host()
>>> ids
array([ 0, 74, 2, 3, 4, 5, 6, 79, 8, 9, 10, 11, 12, 13, 14, 63, 16,
17, 18, 19, 54, 95, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 34, 36, 37, 48, 39, 40, 41, 56, 43, 44, 45, 46, 47, 48, 49, 50,
51, 52, 53, 26, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 1, 75, 76, 57, 78, 79, 80, 81, 82, 83, 84,
85, 86, 87, 88, 89, 90, 22, 92, 93, 94, 95, 96, 97, 98, 99],
dtype=int32)
>>> |
… x and y (#1040) Solves #1036 Even when computing a sum of squares, the distance from a point to itself can apparently be `-0.0` in which case the square root is `nan` and comparisons are broken. Authors: - Louis Sugy (https://github.com/Nyrio) Approvers: - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) URL: #1040
@cjnolet What's the reason for reopening? |
@Nyrio we noticed yesterday that our Google tests haven't been running in CI since July because a change was made to the CI script causing them the runner to fail silently. Unfortunately, there's now some failures in the fused l2 nn tests and the ivf pq tests. We think the fused l2 nn is likely causing the ivf pq errors (through the balanced kmeans). I'm not sure yet if it's the round off errors from this issue that are the culprit. Our python fused p2 nn tests are passing but the Google tests are not and it looks like it's more issues where a neigbor's distance is supposed to be 0 but it's nonzero instead. What I found last night is that ether approximation in the polynomial expansion can cause this. For example, let's say we have a self looped distance, which should be 0. Let's say the norm for X and Y vectors are 2.54321. Those will match because we compute the norms using the same function. The dot product, however, could be 2.54322 because of round off errors. What I notice happening is that the difference between the norms and the dot product will leave something like 0.00001 instead of true 0 and once we take the sqrt of that, we get something like 0.00316 instead of 0. |
Ok, thanks for explaining. |
This should be fixed once and for all in this PR: #1904. I think the changes in that PR also beg for potential future optimizations. That round-off error is not easy to deal with. |
Describe the bug
Running
fused_l2_nn_argmin
sometimes produces results that appear incorrect.As a trivial test case, passing the same matrix for both inputs - sometimes returns results where each row isn't
it's own nearest neighbour.
Steps/Code to reproduce bug
Environment details (please complete the following information):
The text was updated successfully, but these errors were encountered: