Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature-by-case request: Nonlinear time-series clustering based on their statistical similarity #427

Open
sa18 opened this issue Nov 2, 2021 · 0 comments
Assignees
Labels
use case A use case for library or feature

Comments

@sa18
Copy link

sa18 commented Nov 2, 2021

There is a numeric series of temporal data, for example, temperature.

It is required to colorize it by segments, where the same colors would mean the statistical similarity of the data under each segment. I would do it like this:

  1. Split the series into equal segments of a given length.
  2. For all pairs of segments, perform statistical similarity test. The result higher than 70% should mean the pair of segments are similar, and we'll assign the same color on them. Otherwise, we assign different colors.

Expectation from the math library:

  1. Support for optimal storage of time-series (in this case 1D, but in a more general case - multidimensional).
  2. Functional library to perform statical tests (Kolmogorov-Smirnov, Cucconi and others)
  3. Ability to generate permutations, incl. random (required by Cucconi test implementation), with maximum performance and minimum memory consumption.

Here is (more complicated) description of classification by stat tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
use case A use case for library or feature
Projects
None yet
Development

No branches or pull requests

2 participants