You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
WIth reference to the exampel in this notebook, this weekend I compared the performance of the the MERGESORT method vs. the NAIVE with a toy dataset of 8 columns x 21 rows:
%%timeit
dc = np.apply_along_axis(lambda col1: np.apply_along_axis(lambda col2: dcor.distance_correlation(col1,
col2, method = 'NAIVE'), axis = 0, arr=data), axis =0, arr=data)
>>> 24.3 ms ± 334 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
vs:
%%timeit
dc = np.apply_along_axis(lambda col1: np.apply_along_axis(lambda col2: dcor.distance_correlation(col1,
col2, method = 'MERGESORT'), axis = 0, arr=data), axis =0, arr=data)
>>> 17.4 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Since i sometimes work with many thousands of rows, and possibly more columns, I wonder if there is a way to similarly improve the speed of the pairwise p-value calculation:
p = np.apply_along_axis(lambda col1: np.apply_along_axis(lambda col2: dcor.independence.distance_covariance_test(col1,
col2, exponent=1.0, num_resamples=2000)[0],
axis = 0, arr=data), axis =0, arr=data)
>>> 4.38 s ± 119 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The text was updated successfully, but these errors were encountered:
No, as today. The code would need to have a separate branch to handle that case, but it should be relatively easy to implement (adding a new function in _hypothesis to perform a permutation test using the original array instead of the distance matrix, and using that when the method is not NAIVE). If you want to try a PR I could review it.
BTW, if you have additional CPUs you can use the 'AVL' method in distance_correlation and the rowwise function for an extra boost.
WIth reference to the exampel in this notebook, this weekend I compared the performance of the the
MERGESORT
method vs. theNAIVE
with a toy dataset of 8 columns x 21 rows:vs:
Since i sometimes work with many thousands of rows, and possibly more columns, I wonder if there is a way to similarly improve the speed of the pairwise p-value calculation:
The text was updated successfully, but these errors were encountered: