Question: is there a fast method for `dcor.independence.distance_covariance_test` #30

mycarta · 2021-05-30T20:41:26Z

WIth reference to the exampel in this notebook, this weekend I compared the performance of the the MERGESORT method vs. the NAIVE with a toy dataset of 8 columns x 21 rows:

%%timeit
dc = np.apply_along_axis(lambda col1: np.apply_along_axis(lambda col2: dcor.distance_correlation(col1, 
                                                          col2, method = 'NAIVE'), axis = 0, arr=data), axis =0, arr=data)
>>> 24.3 ms ± 334 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

vs:

%%timeit
dc = np.apply_along_axis(lambda col1: np.apply_along_axis(lambda col2: dcor.distance_correlation(col1, 
                                                          col2, method = 'MERGESORT'), axis = 0, arr=data), axis =0, arr=data)
>>> 17.4 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Since i sometimes work with many thousands of rows, and possibly more columns, I wonder if there is a way to similarly improve the speed of the pairwise p-value calculation:

p = np.apply_along_axis(lambda col1: np.apply_along_axis(lambda col2: dcor.independence.distance_covariance_test(col1, 
                                                         col2, exponent=1.0, num_resamples=2000)[0], 
                                                         axis = 0, arr=data), axis =0, arr=data)
>>> 4.38 s ± 119 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The text was updated successfully, but these errors were encountered:

vnmabus · 2021-05-31T10:04:10Z

No, as today. The code would need to have a separate branch to handle that case, but it should be relatively easy to implement (adding a new function in _hypothesis to perform a permutation test using the original array instead of the distance matrix, and using that when the method is not NAIVE). If you want to try a PR I could review it.

BTW, if you have additional CPUs you can use the 'AVL' method in distance_correlation and the rowwise function for an extra boost.

mycarta · 2021-05-31T14:30:12Z

I am at capacity until the fall. After the summer, if as I hope I will have more time, I can give it a try.

For the purposes of my current projects, for the time being I am going to decimate my array really heavily:

decimated_df = data.copy().sample(frac=0.05, random_state=1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: is there a fast method for `dcor.independence.distance_covariance_test` #30

Question: is there a fast method for `dcor.independence.distance_covariance_test` #30

mycarta commented May 30, 2021

vnmabus commented May 31, 2021

mycarta commented May 31, 2021 •

edited

Loading

Question: is there a fast method for dcor.independence.distance_covariance_test #30

Question: is there a fast method for dcor.independence.distance_covariance_test #30

Comments

mycarta commented May 30, 2021

vnmabus commented May 31, 2021

mycarta commented May 31, 2021 • edited Loading

Question: is there a fast method for `dcor.independence.distance_covariance_test` #30

Question: is there a fast method for `dcor.independence.distance_covariance_test` #30

mycarta commented May 31, 2021 •

edited

Loading