Saving of ChannelSliceRecordings inefficient/basically unusable #2328

hornauerp · 2023-12-13T21:16:48Z

I tried to sort a concatenated ChannelSliceRecordings (MaxWell recordings), which failed when writing the binary recording. I went on to try and save one of the ChannelSliceRecordings individually (ChannelSliceRecording: 355 channels - 10.0kHz - 1 segments - 18,000,600 samples - 1,800.06s (30.00 minutes) - uint16 dtype - 11.90 GiB) using sliced_recording.save_to_folder(save_path, n_jobs=-1), which also failed after a few minutes. Importantly, the progress bar did not move and was stuck at 0% 0/601 [23:22<?, ?it/s] indicating that it had not even started writing the file. I then tried to increase the number of cores (up to 72) and the amount of RAM available (up to 1TB), but none of it helped. Checking the resource monitor, I saw that no matter how much RAM I provided, it would fill up completely and then crash with an error like this:

Traceback (most recent call last):
  File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/concurrent/futures/process.py", line 323, in run
    self.terminate_broken(cause)
  File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/concurrent/futures/process.py", line 458, in terminate_broken
    work_item.future.set_exception(bpe)
  File "/home/phornauer/miniconda3/envs/si_env/lib/python3.9/concurrent/futures/_base.py", line 549, in set_exception
    raise InvalidStateError('{}: {!r}'.format(self._state, self))
concurrent.futures._base.InvalidStateError: CANCELLED: <Future at 0x7f30c8d7d490 state=cancelled>

A process in the process pool was terminated abruptly while the future was running or pending.

When trying with n_jobs=1, the progress bar would start filling up, but writing the recording mentioned above would have taken ~77h. My suspicion is that for every job and every chunk the full recording is loaded to memory, but I had a hard time finding the code related to this issue.

All of this was run in Jupyter notebooks on our server (Ubuntu 18.04) with the most recent version of spikeinterface.

Since I have some time pressure to analyze this data, I would really appreciate any help in speeding up this process. Thank you!

EDIT: When saving the full recording or FrameSliceRecordings, the performance is as expected pretty fast, so it must be specific to ChannelSliceRecordings.

The text was updated successfully, but these errors were encountered:

alejoe91 · 2023-12-14T10:14:55Z

Thanks @hornauerp

Can you share the entire script up to the save function?

hornauerp · 2023-12-14T10:39:36Z

Hi Alessio!
Sure, I think the MRE would be along those lines:

import numpy as np
import spikeinterface.full as si

rec_path, stream_id, rec_name = '/path/to/maxwell/recording', 'well000', 'rec0000'
rec = si.MaxwellRecordingExtractor(rec_path, stream_id=stream_id, rec_name=rec_name)
sel_channels = np.random.choice(rec.get_channel_ids(),size=100,replace=False)
slice_rec = rec.channel_slice(sel_channels, renamed_channel_ids=list(range(len(sel_channels))))

save_path = '/save/path'
slice_rec.save_to_folder(folder=save_path, n_jobs=-1)

samuelgarcia · 2023-12-14T11:03:51Z

If you do not slice is the save working ?

alejoe91 · 2023-12-14T11:09:34Z

I think that the problem could be that slicing the hdf5 dataset with non ordered indices is very inefficient... (see https://github.com/NeuralEnsemble/python-neo/blob/master/neo/rawio/maxwellrawio.py#L203)

@hornauerp could you share a test dataset with me? Maybe drop it in a google drive folder or send us a share link from ETH?

hornauerp · 2023-12-14T11:14:36Z

@samuelgarcia Yes, both the full recording and FrameSliceRecordings work fine. I agree with Alessio that it is probably the unordered channel indices that cause the problem.

@alejoe91 Yes, I will look for a small one and send you the link.

alejoe91 · 2023-12-14T11:22:03Z

Not too small ;)

hornauerp · 2023-12-14T11:34:27Z

The file size really seems to be the main issue. I tried the code with a 5min recording (3.3GB) and it finished pretty quickly. The same code with a 30min recording is now using up 300GB RAM and still increasing.

alejoe91 · 2023-12-14T11:39:45Z

Maybe it's some garbage collector issue...

Do you have NEO installed from source? In that case, can you try to add import gc here and gc.collect() here?

hornauerp · 2023-12-14T12:03:22Z

I tried it and it did not change the problem. The memory explodes already before the file is actually written (progress bar does not start moving), so I assume the issue is before the garbage collection.

alejoe91 · 2023-12-14T12:04:36Z

Tagging @h-mayorquin because he likes this stuff :)

hornauerp · 2023-12-14T13:27:38Z

@alejoe91 https://polybox.ethz.ch/index.php/s/r1AIJDmVYbXpi9s

h-mayorquin · 2023-12-14T13:42:54Z

Can you share the script you used?

[EDIT]
Oh, is probably this?

import numpy as np
import spikeinterface.full as si

rec_path, stream_id, rec_name = '/path/to/maxwell/recording', 'well000', 'rec0000'
rec = si.MaxwellRecordingExtractor(rec_path, stream_id=stream_id, rec_name=rec_name)
sel_channels = np.random.choice(rec.get_channel_ids(),size=100,replace=False)
slice_rec = rec.channel_slice(sel_channels, renamed_channel_ids=list(range(len(sel_channels))))

save_path = '/save/path'
slice_rec.save_to_folder(folder=save_path, n_jobs=-1)

And the Maxwell recording is the one you shared.

hornauerp · 2023-12-14T13:47:59Z

Yes exactly, this should reproduce the problem.

alejoe91 · 2023-12-14T14:09:55Z

Philipp, I couldn't reproduce the issueon my ubuntu machine.
Are you running this on Windows?

Here is how it ran on my machine on the dataset you shared:

Profile:

It only used <1GB RAM throughout the save process.

h-mayorquin · 2023-12-14T14:14:06Z

I can test it on windows but can you tell us how you are measuring ram?

hornauerp · 2023-12-14T14:14:39Z

On our Ubuntu server. Here an example with n_jobs=72:

alejoe91 · 2023-12-14T14:22:30Z

I could also run a save with 5 concatenated sliced recordings with no problem:

Is it possible that this is related to a Python version? What Python version are you currently on?

hornauerp · 2023-12-14T14:28:00Z

Python 3.10, but it might also be the h5py version. Which one are you using?

hornauerp · 2023-12-14T14:46:50Z

Your writing speed seems much higher in general, but that is probably just the hardware difference.

alejoe91 · 2023-12-14T14:55:30Z

h5py                              3.10.0

alejoe91 · 2023-12-14T14:56:02Z

Yeah I'm writing from a local HDD to a local SSD :)

hornauerp · 2023-12-14T15:04:27Z

I also use the same h5py version.

But I think I might have a clue. When I save the full MaxWell recording and then load it again as a BinaryFolderRecording using loaded_rec = si.load_extractor(save_path) and then slice it, saving works fine. I assume that the io for the BinaryFolderRecording is different from the MaxWell one, which causes the problem in my case.

alejoe91 · 2023-12-14T15:13:42Z

that's for sure, but in my test I was reading directly the Maxwell file, not the binary

h-mayorquin · 2023-12-14T16:14:07Z

I think it is very important to know how you are measuring memory. There is a big difference if you are getting rss vs virtual size. I think Ubuntu system monitor measures rss - shared but we need to know exactly what metric is exploding. I suspect the heap size is what is exploding.

[EDIT: Actually that's about reading I would need an equivalent for writing]

hornauerp · 2023-12-14T20:58:17Z

Tried it in a new env with Python 3.11 and installed spikeinterface and neo from source and I still run into the same problem.

hornauerp · 2023-12-14T23:34:14Z

Nevermind, it seems like we forgot to revert some changes in the maxwellrawio.py file when we debugged the shuffled channel issue #1691 @alejoe91. I think I never realized the bug as the axon tracking files are usually pretty small and the difference in memory usage were too small to be noticeable. It works now as intended, sorry for the hassle!

alejoe91 added the performance Performance issues/improvements label Dec 14, 2023

hornauerp closed this as completed Dec 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving of ChannelSliceRecordings inefficient/basically unusable #2328

Saving of ChannelSliceRecordings inefficient/basically unusable #2328

hornauerp commented Dec 13, 2023 •

edited

Loading

alejoe91 commented Dec 14, 2023

hornauerp commented Dec 14, 2023 •

edited

Loading

samuelgarcia commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

hornauerp commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

hornauerp commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

hornauerp commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

hornauerp commented Dec 14, 2023

h-mayorquin commented Dec 14, 2023 •

edited

Loading

hornauerp commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

h-mayorquin commented Dec 14, 2023

hornauerp commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

hornauerp commented Dec 14, 2023

hornauerp commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

hornauerp commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

h-mayorquin commented Dec 14, 2023 •

edited

Loading

hornauerp commented Dec 14, 2023

hornauerp commented Dec 14, 2023

Saving of ChannelSliceRecordings inefficient/basically unusable #2328

Saving of ChannelSliceRecordings inefficient/basically unusable #2328

Comments

hornauerp commented Dec 13, 2023 • edited Loading

alejoe91 commented Dec 14, 2023

hornauerp commented Dec 14, 2023 • edited Loading

samuelgarcia commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

hornauerp commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

hornauerp commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

hornauerp commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

hornauerp commented Dec 14, 2023

h-mayorquin commented Dec 14, 2023 • edited Loading

hornauerp commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

h-mayorquin commented Dec 14, 2023

hornauerp commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

hornauerp commented Dec 14, 2023

hornauerp commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

hornauerp commented Dec 14, 2023

alejoe91 commented Dec 14, 2023

h-mayorquin commented Dec 14, 2023 • edited Loading

hornauerp commented Dec 14, 2023

hornauerp commented Dec 14, 2023

hornauerp commented Dec 13, 2023 •

edited

Loading

hornauerp commented Dec 14, 2023 •

edited

Loading

h-mayorquin commented Dec 14, 2023 •

edited

Loading

h-mayorquin commented Dec 14, 2023 •

edited

Loading