Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving of ChannelSliceRecordings inefficient/basically unusable #2328

Closed
hornauerp opened this issue Dec 13, 2023 · 26 comments
Closed

Saving of ChannelSliceRecordings inefficient/basically unusable #2328

hornauerp opened this issue Dec 13, 2023 · 26 comments
Labels
performance Performance issues/improvements

Comments

@hornauerp
Copy link

hornauerp commented Dec 13, 2023

I tried to sort a concatenated ChannelSliceRecordings (MaxWell recordings), which failed when writing the binary recording. I went on to try and save one of the ChannelSliceRecordings individually (ChannelSliceRecording: 355 channels - 10.0kHz - 1 segments - 18,000,600 samples - 1,800.06s (30.00 minutes) - uint16 dtype - 11.90 GiB) using sliced_recording.save_to_folder(save_path, n_jobs=-1), which also failed after a few minutes. Importantly, the progress bar did not move and was stuck at 0% 0/601 [23:22<?, ?it/s] indicating that it had not even started writing the file. I then tried to increase the number of cores (up to 72) and the amount of RAM available (up to 1TB), but none of it helped. Checking the resource monitor, I saw that no matter how much RAM I provided, it would fill up completely and then crash with an error like this:

Traceback (most recent call last):
  File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/concurrent/futures/process.py", line 323, in run
    self.terminate_broken(cause)
  File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/concurrent/futures/process.py", line 458, in terminate_broken
    work_item.future.set_exception(bpe)
  File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/concurrent/futures/_base.py", line 549, in set_exception
    raise InvalidStateError('{}: {!r}'.format(self._state, self))
concurrent.futures._base.InvalidStateError: CANCELLED: <Future at 0x7f30c8d7d490 state=cancelled>

A process in the process pool was terminated abruptly while the future was running or pending.

When trying with n_jobs=1, the progress bar would start filling up, but writing the recording mentioned above would have taken ~77h. My suspicion is that for every job and every chunk the full recording is loaded to memory, but I had a hard time finding the code related to this issue.

All of this was run in Jupyter notebooks on our server (Ubuntu 18.04) with the most recent version of spikeinterface.

Since I have some time pressure to analyze this data, I would really appreciate any help in speeding up this process. Thank you!

EDIT: When saving the full recording or FrameSliceRecordings, the performance is as expected pretty fast, so it must be specific to ChannelSliceRecordings.

@alejoe91
Copy link
Member

Thanks @hornauerp

Can you share the entire script up to the save function?

@hornauerp
Copy link
Author

hornauerp commented Dec 14, 2023

Hi Alessio!
Sure, I think the MRE would be along those lines:

import numpy as np
import spikeinterface.full as si

rec_path, stream_id, rec_name = '/path/to/maxwell/recording', 'well000', 'rec0000'
rec = si.MaxwellRecordingExtractor(rec_path, stream_id=stream_id, rec_name=rec_name)
sel_channels = np.random.choice(rec.get_channel_ids(),size=100,replace=False)
slice_rec = rec.channel_slice(sel_channels, renamed_channel_ids=list(range(len(sel_channels))))

save_path = '/save/path'
slice_rec.save_to_folder(folder=save_path, n_jobs=-1)

@samuelgarcia
Copy link
Member

If you do not slice is the save working ?

@alejoe91
Copy link
Member

I think that the problem could be that slicing the hdf5 dataset with non ordered indices is very inefficient... (see https://github.com/NeuralEnsemble/python-neo/blob/master/neo/rawio/maxwellrawio.py#L203)

@hornauerp could you share a test dataset with me? Maybe drop it in a google drive folder or send us a share link from ETH?

@hornauerp
Copy link
Author

@samuelgarcia Yes, both the full recording and FrameSliceRecordings work fine. I agree with Alessio that it is probably the unordered channel indices that cause the problem.

@alejoe91 Yes, I will look for a small one and send you the link.

@alejoe91
Copy link
Member

Not too small ;)

@hornauerp
Copy link
Author

The file size really seems to be the main issue. I tried the code with a 5min recording (3.3GB) and it finished pretty quickly. The same code with a 30min recording is now using up 300GB RAM and still increasing.

@alejoe91
Copy link
Member

Maybe it's some garbage collector issue...

Do you have NEO installed from source? In that case, can you try to add import gc here and gc.collect() here?

@alejoe91 alejoe91 added the performance Performance issues/improvements label Dec 14, 2023
@hornauerp
Copy link
Author

I tried it and it did not change the problem. The memory explodes already before the file is actually written (progress bar does not start moving), so I assume the issue is before the garbage collection.

@alejoe91
Copy link
Member

Tagging @h-mayorquin because he likes this stuff :)

@hornauerp
Copy link
Author

@h-mayorquin
Copy link
Collaborator

h-mayorquin commented Dec 14, 2023

Can you share the script you used?

[EDIT]
Oh, is probably this?

import numpy as np
import spikeinterface.full as si

rec_path, stream_id, rec_name = '/path/to/maxwell/recording', 'well000', 'rec0000'
rec = si.MaxwellRecordingExtractor(rec_path, stream_id=stream_id, rec_name=rec_name)
sel_channels = np.random.choice(rec.get_channel_ids(),size=100,replace=False)
slice_rec = rec.channel_slice(sel_channels, renamed_channel_ids=list(range(len(sel_channels))))

save_path = '/save/path'
slice_rec.save_to_folder(folder=save_path, n_jobs=-1)

And the Maxwell recording is the one you shared.

@hornauerp
Copy link
Author

Yes exactly, this should reproduce the problem.

@alejoe91
Copy link
Member

Philipp, I couldn't reproduce the issueon my ubuntu machine.
Are you running this on Windows?

Here is how it ran on my machine on the dataset you shared:
image

Profile:
image

It only used <1GB RAM throughout the save process.

@h-mayorquin
Copy link
Collaborator

I can test it on windows but can you tell us how you are measuring ram?

@hornauerp
Copy link
Author

On our Ubuntu server. Here an example with n_jobs=72:
grafik

@alejoe91
Copy link
Member

I could also run a save with 5 concatenated sliced recordings with no problem:
image

Is it possible that this is related to a Python version? What Python version are you currently on?

@hornauerp
Copy link
Author

Python 3.10, but it might also be the h5py version. Which one are you using?

@hornauerp
Copy link
Author

Your writing speed seems much higher in general, but that is probably just the hardware difference.

@alejoe91
Copy link
Member

h5py                              3.10.0

@alejoe91
Copy link
Member

Yeah I'm writing from a local HDD to a local SSD :)

@hornauerp
Copy link
Author

I also use the same h5py version.

But I think I might have a clue. When I save the full MaxWell recording and then load it again as a BinaryFolderRecording using loaded_rec = si.load_extractor(save_path) and then slice it, saving works fine. I assume that the io for the BinaryFolderRecording is different from the MaxWell one, which causes the problem in my case.

@alejoe91
Copy link
Member

that's for sure, but in my test I was reading directly the Maxwell file, not the binary

@h-mayorquin
Copy link
Collaborator

h-mayorquin commented Dec 14, 2023

I think it is very important to know how you are measuring memory. There is a big difference if you are getting rss vs virtual size. I think Ubuntu system monitor measures rss - shared but we need to know exactly what metric is exploding. I suspect the heap size is what is exploding.

[EDIT: Actually that's about reading I would need an equivalent for writing]

@hornauerp
Copy link
Author

Tried it in a new env with Python 3.11 and installed spikeinterface and neo from source and I still run into the same problem.

@hornauerp
Copy link
Author

Nevermind, it seems like we forgot to revert some changes in the maxwellrawio.py file when we debugged the shuffled channel issue #1691 @alejoe91. I think I never realized the bug as the axon tracking files are usually pretty small and the difference in memory usage were too small to be noticeable. It works now as intended, sorry for the hassle!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance issues/improvements
Projects
None yet
Development

No branches or pull requests

4 participants