Difference between edm-xmap executable and Python binding xmap #49

keichi · 2023-09-21T00:51:31Z

@keichi , I'm trying to understand the difference between the edm-xmap executable and the python binding xmap. Are they different? In the documentation, it says the executable performs pairwise cross mapping using simplex projection. What does the python binding xmap do? Furthermore, how are the xmap and ccm python functions different? Are the random samples drawn in the CCM function excluded from the embedding procedure?

A follow up question is why the python bindings run slower than the executable. Both appear to be using the GPU, but the executable calculates the all-to-all CCM for a 10,000 x 500 matrix in like 20 seconds whereas the python code takes forever (i didn't finish it). it's almost as if the python binding is only partially utilizing the GPU.

Originally posted by @davidgwyrick in #40 (comment)

keichi · 2023-09-21T01:38:28Z

@davidgwyrick The edm-xmap executable and the Python xmap() function should produce the same output and performance. The backstory is that we initially had the edm-xmap executable only, but then we expanded the library and added Python bindings.

The differences between xmap() and ccm() are:

xmap() omits the convergence check part of CCM. CCM rho is calculated using the full library only, and random sampling. Our use case is to use xmap() to screen promising combinations and then test for convergence using ccm().
xmap() accepts multiple time series (tested up to 100K) and computes the CCM rho for every pair of time series (both directions). ccm() accepts only two time series. We also plan to implement an all-to-all version of ccm() but I didn't have the bandwidth.

keichi · 2023-09-21T01:40:37Z

The Python xmap() should perform the same as the executable. Can you run the following and paste the results?

import kedm
print(kedm.__file__)
print(kedm.get_kokkos_config())

davidgwyrick · 2023-09-21T03:19:19Z

I figured out where I went wrong in the installation process. after building the source code, I ran pip install -e . in the kEDM directory, rather than "pip3 install git+https://github.com/keichi/kEDM.git" . I think the -e option might have caused it to switch to no-gpu, Idk. It works now! SO FAST! like i'm so excited. My code is slow as a snail compared to this.

By checking for convergence you mean running different size libraries to see if the CCM rho changes? And if it does, that means the pair of timeseries may be causally interacting? Seeing your explanation about xmap vs ccm, I think I've always been doing just xmap then in my own code. I'm not sure if you're aware of this paper? [https://www.biorxiv.org/content/10.1101/2020.11.23.394916v3.abstract]

How do you estimate the optimal embedding dimension of a time series? How we do it is by determining the smallest embedding dimension that gives you the largest CCM value for a pair of timeseries. The thought being that you would expect that to the embedding dimension to change for different pairs of timeseries right. But based on the function, it looks like you're estimating the embedding dimension per timeseries, not per pair of timeseries?

davidgwyrick · 2023-09-22T23:31:43Z

Basically, what I want to figure out is how to utilize your ccm python binding function to obtain a cross-validated measure of ccm-rho. Are the random samples that are tested to calculate ccm-rho excluded from the embedding / nearest neighbor search? A similar question is what portion of the data is used as library when you give a library size smaller than the actual library time series? Do you understand? Usually I do simple k-fold cross-validation, but here with the random samples I can't.

I apologize for all of the questions, but I am not proficient enough in cpp to understand the code really.

davidgwyrick · 2023-10-13T00:25:00Z

@keichi , hey I know you're probably super busy but if you could help me understand the following, I would greatly appreciate it.

The xmap function does not give a option for the number of random samples to be tested, while ccm does. How is xmap evaluating the cross map skill then? Is it not cross-validated?
For the ccm function, are the random samples that are tested to calculate ccm-rho excluded from the embedding / nearest neighbor search?
A similar question is what portion of the data is used as library when you give a library size smaller than the actual library time series?
How do you estimate the optimal embedding dimension of a time series?

keichi changed the title ~~Difference between **edm-xmap** executable and Python binding _xmap_~~ Difference between edm-xmap executable and Python binding xmap Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between edm-xmap executable and Python binding xmap #49

Difference between edm-xmap executable and Python binding xmap #49

keichi commented Sep 21, 2023 •

edited

Loading

keichi commented Sep 21, 2023

keichi commented Sep 21, 2023

davidgwyrick commented Sep 21, 2023

davidgwyrick commented Sep 22, 2023

davidgwyrick commented Oct 13, 2023

Difference between edm-xmap executable and Python binding xmap #49

Difference between edm-xmap executable and Python binding xmap #49

Comments

keichi commented Sep 21, 2023 • edited Loading

keichi commented Sep 21, 2023

keichi commented Sep 21, 2023

davidgwyrick commented Sep 21, 2023

davidgwyrick commented Sep 22, 2023

davidgwyrick commented Oct 13, 2023

keichi commented Sep 21, 2023 •

edited

Loading