Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between edm-xmap executable and Python binding xmap #49

Open
keichi opened this issue Sep 21, 2023 · 5 comments
Open

Difference between edm-xmap executable and Python binding xmap #49

keichi opened this issue Sep 21, 2023 · 5 comments

Comments

@keichi
Copy link
Owner

keichi commented Sep 21, 2023

@keichi , I'm trying to understand the difference between the edm-xmap executable and the python binding xmap. Are they different? In the documentation, it says the executable performs pairwise cross mapping using simplex projection. What does the python binding xmap do? Furthermore, how are the xmap and ccm python functions different? Are the random samples drawn in the CCM function excluded from the embedding procedure?

A follow up question is why the python bindings run slower than the executable. Both appear to be using the GPU, but the executable calculates the all-to-all CCM for a 10,000 x 500 matrix in like 20 seconds whereas the python code takes forever (i didn't finish it). it's almost as if the python binding is only partially utilizing the GPU.

Originally posted by @davidgwyrick in #40 (comment)

@keichi keichi changed the title Difference between **edm-xmap** executable and Python binding _xmap_ Difference between edm-xmap executable and Python binding xmap Sep 21, 2023
@keichi
Copy link
Owner Author

keichi commented Sep 21, 2023

@davidgwyrick The edm-xmap executable and the Python xmap() function should produce the same output and performance. The backstory is that we initially had the edm-xmap executable only, but then we expanded the library and added Python bindings.

The differences between xmap() and ccm() are:

  • xmap() omits the convergence check part of CCM. CCM rho is calculated using the full library only, and random sampling. Our use case is to use xmap() to screen promising combinations and then test for convergence using ccm().
  • xmap() accepts multiple time series (tested up to 100K) and computes the CCM rho for every pair of time series (both directions). ccm() accepts only two time series. We also plan to implement an all-to-all version of ccm() but I didn't have the bandwidth.

@keichi
Copy link
Owner Author

keichi commented Sep 21, 2023

The Python xmap() should perform the same as the executable. Can you run the following and paste the results?

import kedm
print(kedm.__file__)
print(kedm.get_kokkos_config())

@davidgwyrick
Copy link

I figured out where I went wrong in the installation process. after building the source code, I ran pip install -e . in the kEDM directory, rather than "pip3 install git+https://github.com/keichi/kEDM.git" . I think the -e option might have caused it to switch to no-gpu, Idk. It works now! SO FAST! like i'm so excited. My code is slow as a snail compared to this.

By checking for convergence you mean running different size libraries to see if the CCM rho changes? And if it does, that means the pair of timeseries may be causally interacting? Seeing your explanation about xmap vs ccm, I think I've always been doing just xmap then in my own code. I'm not sure if you're aware of this paper? [https://www.biorxiv.org/content/10.1101/2020.11.23.394916v3.abstract]

How do you estimate the optimal embedding dimension of a time series? How we do it is by determining the smallest embedding dimension that gives you the largest CCM value for a pair of timeseries. The thought being that you would expect that to the embedding dimension to change for different pairs of timeseries right. But based on the function, it looks like you're estimating the embedding dimension per timeseries, not per pair of timeseries?

@davidgwyrick
Copy link

Basically, what I want to figure out is how to utilize your ccm python binding function to obtain a cross-validated measure of ccm-rho. Are the random samples that are tested to calculate ccm-rho excluded from the embedding / nearest neighbor search? A similar question is what portion of the data is used as library when you give a library size smaller than the actual library time series? Do you understand? Usually I do simple k-fold cross-validation, but here with the random samples I can't.

I apologize for all of the questions, but I am not proficient enough in cpp to understand the code really.

@davidgwyrick
Copy link

@keichi , hey I know you're probably super busy but if you could help me understand the following, I would greatly appreciate it.

  1. The xmap function does not give a option for the number of random samples to be tested, while ccm does. How is xmap evaluating the cross map skill then? Is it not cross-validated?

  2. For the ccm function, are the random samples that are tested to calculate ccm-rho excluded from the embedding / nearest neighbor search?

  3. A similar question is what portion of the data is used as library when you give a library size smaller than the actual library time series?

  4. How do you estimate the optimal embedding dimension of a time series?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants