-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU usage by Kilosort4 when running with run_sorter_by_property #3591
Comments
Hi @jazlynntan I have no idea and this is just a quick guess, but I'm not sure what will happen regarding GPU access when when parallelising multiple sortings over separate cores. Presumably the separate processes are all attempting to compute on the GPU but from the runtime it doesn't seem like they are sequentially accessing the GPU in any useful way. It might be worth testing by running with |
That was Sam's idea too in another issue (I forget which one). He recommended doing |
Also we might need to add a note to our docs to explain that joblib might not play well with gpu-based sorters. Not sure, but this is the second issue related to this. |
Hi, I tried the first suggestions:
I think the same problem persists? GPU memory is used but not the computation. The whole sorting for the single shank took about 5h and the resource report is as follows:
I'm now attempting to use 'loop' for engine and 16 jobs. I'll update again when its done. |
Yep just let us know if loop solves it. Might take longer to do though! |
Hello,
I'm running kilosort4 for a single shank using run_sorter_by_property(). Using Kilosort4 independently in the same conda environment, the same data (with all 4 shanks) took about 1.5h. However, a single shank within spikeinterface took about 6h. This leads me to suspect that the GPU is not being used?
This is the output while kilosort within spikeinterface was running:
The GPU memory seems to be used by the process but the speed seems to suggest that the GPU is not used for computation. Meanwhile the CPU usage appeared to be maxed out.
This is the code I'm using:
I tried using 'auto' and 'cuda' for the torch_device parameter, but both faced the same issue.
May I know if I am doing something wrong? Thank you!
The text was updated successfully, but these errors were encountered: