-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving of ChannelSliceRecordings inefficient/basically unusable #2328
Comments
Thanks @hornauerp Can you share the entire script up to the |
Hi Alessio!
|
If you do not slice is the save working ? |
I think that the problem could be that slicing the hdf5 dataset with non ordered indices is very inefficient... (see https://github.com/NeuralEnsemble/python-neo/blob/master/neo/rawio/maxwellrawio.py#L203) @hornauerp could you share a test dataset with me? Maybe drop it in a google drive folder or send us a share link from ETH? |
@samuelgarcia Yes, both the full recording and FrameSliceRecordings work fine. I agree with Alessio that it is probably the unordered channel indices that cause the problem. @alejoe91 Yes, I will look for a small one and send you the link. |
Not too small ;) |
The file size really seems to be the main issue. I tried the code with a 5min recording (3.3GB) and it finished pretty quickly. The same code with a 30min recording is now using up 300GB RAM and still increasing. |
I tried it and it did not change the problem. The memory explodes already before the file is actually written (progress bar does not start moving), so I assume the issue is before the garbage collection. |
Tagging @h-mayorquin because he likes this stuff :) |
Can you share the script you used? [EDIT]
And the Maxwell recording is the one you shared. |
Yes exactly, this should reproduce the problem. |
I can test it on windows but can you tell us how you are measuring ram? |
Python 3.10, but it might also be the h5py version. Which one are you using? |
Your writing speed seems much higher in general, but that is probably just the hardware difference. |
|
Yeah I'm writing from a local HDD to a local SSD :) |
I also use the same h5py version. But I think I might have a clue. When I save the full MaxWell recording and then load it again as a BinaryFolderRecording using |
that's for sure, but in my test I was reading directly the Maxwell file, not the binary |
I think it is very important to know how you are measuring memory. There is a big difference if you are getting rss vs virtual size. I think Ubuntu system monitor measures rss - shared but we need to know exactly what metric is exploding. I suspect the heap size is what is exploding. [EDIT: Actually that's about reading I would need an equivalent for writing] |
Tried it in a new env with Python 3.11 and installed spikeinterface and neo from source and I still run into the same problem. |
Nevermind, it seems like we forgot to revert some changes in the maxwellrawio.py file when we debugged the shuffled channel issue #1691 @alejoe91. I think I never realized the bug as the axon tracking files are usually pretty small and the difference in memory usage were too small to be noticeable. It works now as intended, sorry for the hassle! |
I tried to sort a concatenated ChannelSliceRecordings (MaxWell recordings), which failed when writing the binary recording. I went on to try and save one of the ChannelSliceRecordings individually (ChannelSliceRecording: 355 channels - 10.0kHz - 1 segments - 18,000,600 samples - 1,800.06s (30.00 minutes) - uint16 dtype - 11.90 GiB) using
sliced_recording.save_to_folder(save_path, n_jobs=-1)
, which also failed after a few minutes. Importantly, the progress bar did not move and was stuck at 0% 0/601 [23:22<?, ?it/s] indicating that it had not even started writing the file. I then tried to increase the number of cores (up to 72) and the amount of RAM available (up to 1TB), but none of it helped. Checking the resource monitor, I saw that no matter how much RAM I provided, it would fill up completely and then crash with an error like this:When trying with
n_jobs=1
, the progress bar would start filling up, but writing the recording mentioned above would have taken ~77h. My suspicion is that for every job and every chunk the full recording is loaded to memory, but I had a hard time finding the code related to this issue.All of this was run in Jupyter notebooks on our server (Ubuntu 18.04) with the most recent version of spikeinterface.
Since I have some time pressure to analyze this data, I would really appreciate any help in speeding up this process. Thank you!
EDIT: When saving the full recording or FrameSliceRecordings, the performance is as expected pretty fast, so it must be specific to ChannelSliceRecordings.
The text was updated successfully, but these errors were encountered: