-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of pairwise distances computation #40
Comments
Using the 'dcor. rowwise' function, I found a way to compute the pairwise distances. The idea is to calculate the distances using the 'dcor. rowwise' function first and later allocate them in the symmetric distance matrix form. I did implement it on the multivariate normal data. The Python implementation is as follows:
If you see the outcome I print both the result of scipy pdist and the fast one utilizing rowwise function |
Comment on performance with rowwise: ive benchmarked rowwise on a dataset of ca 23000 datasets and feature creation with dcor.rowwise(dcor.distance_covariance, a, b) using dcor.rowwise. Using distance_covariance took 8 minutes 41 seconds, and using rowwise took 6 minutes 54 seconds. So 20% improvement. |
It does not seem like such a big improvement (although everything helps, of course). To be honest, I am not very happy with |
Yes, it seems |
The computation of pairwise distances is the main bottleneck of the naive algorithm for distance covariance. Currently we use scipy's cdist for Numpy arrays, and a broadcasting computation in other case.
Any performance improvement to this function is thus well received.
The text was updated successfully, but these errors were encountered: