Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record compute time for each SPI #7

Open
olivercliff opened this issue Feb 15, 2022 · 4 comments
Open

Record compute time for each SPI #7

olivercliff opened this issue Feb 15, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@olivercliff
Copy link
Collaborator

Will be useful for knowing which methods are fast/slow to compute and allows users to select faster options.

This might be finicky since many of the methods inherit preprocessed information from other methods (e.g., all spectral methods inherit spectral decompositions).

@olivercliff olivercliff added the enhancement New feature or request label Feb 17, 2022
@anniegbryant
Copy link
Member

Just wanted to echo this -- presented pyspi at CNS2022 and received questions about approx how long each SPI takes so users can estimate time requirements for a job

@benfulcher
Copy link
Collaborator

Could check whether the preprocessed information is ~fast and thus could be neglected for an initial estimate. If so, this could be straightforward to benchmark on a range of simple VAR processes (for # processes and # time points).

@mesner
Copy link

mesner commented Jan 25, 2024

For posterity, as I'm sure no one else cares after 18 months.
Here's a code snipped I used for the same question.
Yes, it's hack, and the total time is about 2x what it takes to calculate them all at once (IIRC).
But, it's something.
Note that some spi's fail in my gpu-less linux env.

import numpy as np
import pandas as pd
import random
import time
from pyspi.calculator import Calculator

random.seed(42)

M = 2 # 5 processes
T = 300 # 500 observations

dataset = np.random.randn(M,T)
calc = Calculator(dataset=dataset)
spi_items = calc.spis.copy()
df_rows = []
for (k,v) in spi_items.items():
    calc.spis.clear()
    calc.spis[k] = v
    begTime = time.perf_counter()
    calc.compute()
    calcTime = time.perf_counter() - begTime
    df_rows.append(dict(spi=k, time=calcTime))


pd.DataFrame(df_rows).to_csv("calc_spi_times.csv",index=False)

calc_spi_times.csv

@anniegbryant
Copy link
Member

Thank you very much for adding this, @mesner! Very helpful, indeed :)

You bring up an interesting point about computation time taking ~2x as long doing each SPI piecemeal versus all at once, which was also my experience when I tried a similar analysis. I believe @olivercliff designed pyspi in a sort of hierarchical computation method, wherein some parent computations are performed for a given SPI group (e.g., transfer entropy, precision matrices) that then propagate to individual SPIs therein to save time/computation. So it's a bit tricky to derive the amount of time each individual SPI takes in practice, but I think this is a great approximation for users interested in the relative computation time for each SPI. For example, it makes sense that the convergent cross-mapping (ccm_) SPIs take orders of magnitude longer than most of the other SPIs.

For what it's worth, we played around with this question using different SPI subset configurations and multivariate time series (MTS) data sizes if you're interested: https://pyspi-toolkit.readthedocs.io/en/latest/faq.html#how-long-does-pyspi-take-to-run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants