-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seemingly incorrect results with int
datatype
#59
Comments
It seems that an overflow occurs in the last case, as I obtain the following warning:
I will check if it is possible to avoid it when I have a moment. |
I added a possible fix in #60. Note that converting to floating point arrays is still preferable, because the AVL implementation is compiled in that case. |
I will merge #60. Note that this can STILL overflow, specially in Windows where the default integer type (used for integer reductions, if the original type was smaller) has only 32 bits. As mentioned before, converting to floating point is preferred. |
While experimenting with this package, I encountered a strange issue and thought it would be useful to post about it here. In short, it appears that the distance_correlation computation for
int
dtypes is incorrect when the size of the data is sufficiently large.Here is a minimal example that can be used to replicate the issue:
Now when we run this code for small samples, the correlations for all dtypes agree, and do not substantially change with the sample size.
However, past a certain point, the computations diverge:
I've started casting everything to
float
before computing the correlations to avoid this issue.The text was updated successfully, but these errors were encountered: