Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Faults on Small Datasets #19

Open
evanharwin opened this issue Oct 29, 2021 · 4 comments
Open

Segmentation Faults on Small Datasets #19

evanharwin opened this issue Oct 29, 2021 · 4 comments
Assignees

Comments

@evanharwin
Copy link

evanharwin commented Oct 29, 2021

What Happened

Get a segmentation fault when running catch22.catch22_all on short lists/numpy.arrays.

What I Expected to Happen

Returns a dictionary of features (perhaps with a lot of NaN-types due to the short timeseries)

Minimum Complete Verifable Example

>>> import catch22
>>> catch22.catch22_all([1,2])
Segmentation fault

Further Details

Can check that this doesn't happen with longer arrays like so:

import catch22
timeseries = list(range(10))
while timeseries:
    print(len(timeseries))
    catch22.catch22_all(timeseries)
    timeseries = timeseries[:-1]
@benfulcher
Copy link
Collaborator

Thanks for the clear description. Would indeed be preferable to have the behavior you describe (for features that require a minimum length, adding an early catch for time-series length with a NaN output if under minimum length) would help avoid seg faults. @chlubba do you have bandwidth? Otherwise @hendersontrent might be able to help.

@hendersontrent
Copy link
Collaborator

There is some code @chlubba wrote in main.c to determine if code can be run in catch22 (lines 23-45 here), but the main.c operations don't get used outside of raw C, so the wrappers (e.g., Python) are not performing this check. We could either add a check to each of the 22 feature functions, or add a single call within the core Python/Matlab/etc catch22_all() function that does this prior to computing anything. Happy to consider an alternative if I have misunderstood.

@chlubba
Copy link
Collaborator

chlubba commented Nov 2, 2021

Hi @evanharwin, @hendersontrent and @benfulcher,

what you, @hendersontrent, proposed, namely to add the check in the core wrapping functions, makes the most sense to me.

I could find the time to do this in ~2 weeks. If you are able to add this check earlier, thanks for doing so!

Best,
Carl

@hendersontrent
Copy link
Collaborator

hendersontrent commented Nov 8, 2021

I just ran each feature individually in Rcatch22 to test which ones error on a T = 2 time series, and CO_Embed2_Dist_tau_d_expfit_meandiff was the only feature to do so... Maybe the fix only needs to go into the C code for that 1 feature @benfulcher?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants