-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to use multiple GPUs – CUDA Out of Memory issue #105
Comments
Hey @Devloper-RG, thanks for raising this issue and testing the lib in a multi-GPU setup 🙏 |
I guess the issue here is that we are pushing to |
Hey @eustlb , thanks for getting back to me! I made some modifications to the code to use the Thereafter I ran the server on a Google Cloud Platform (GCP) VM with 2 NVIDIA T4 GPUs. During testing, I noticed that one of the GPUs consistently overloads, leading to a I tried using the DataParallel method, but it didn’t resolve the issue. I also attempted to run the model in lower precision, which worked on a single GPU, but I’d like to use higher precision models and fully leverage multiple GPUs for better performance. Any help with getting multi-GPU support working would be greatly appreciated! |
Hi @Devloper-RG , if you can give a snippet with some reproducible code, that would be very helpful. Otherwise, we can't know what your issue is. We have run this on a setup with multiple GPUs without any problems. |
While Using other models like meta-llama/Meta-Llama-3.1-8B-Instruct
I'm encountering a torch.OutOfMemoryError when trying to load a model on multiple GPUs. I have 4 GPUs, each with 14.57 GiB memory, but the model fails to allocate memory on GPU 0, even though other GPUs should share the load.
The text was updated successfully, but these errors were encountered: