-
-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-GPU issues #281
Comments
Bug in hip or rocm. On nvidia it's working to split. Other bug is OOM if you can't properly dispatch the model so it doesn't run out during inference. |
Thanks for your reply... I've raised the issue on HIPs github support thread : |
Just in case you haven't tried it yet, the --gpu_peer_fix argument (corresponding entry in |
Thanks for your reply, and your excellent coding, it's great when it works... I looked into this, and had trouble finding how to do such a thing... I have been looking for ( but have yet to find again ) a page that I found... |
Yep,
I'm thinking another thing to explore would be the use of |
I got a reply on Oobabooga posting about the passing Thanks for your replies... I have been thinking of this, so I'll mention it. Related to this, instead of splitting model across GPUs, |
Cache and state has to reside on the same device as the associated weights. You can't do CUDA operations across devices, and while you could store just the cache on a separate device, it would be slower than just swapping it to system RAM, which is still slow enough to be kind of useless. |
Guess, I forgot to answer here, this is the same issue as #173 which was fixed upstream and will be available in next ROCm version. Note that exllama v2 is also affected and this could have easily been fixed locally in exllama with a small hack like it was done in llama.cpp, but I didn't have the hardware to test. |
I can now report, that using latest drivers, it seems to work now. |
Here's another bug on Oobabooga's project that is unresolved...
oobabooga/text-generation-webui#2923
I realized that the ExLlama team may have a solution....
So thought I'd cross post this issue on this project, in case you've not seen.
Here's the guide I wrote to get everything working on AMD kit...
https://github.com/nktice/AMD-AI
Models load fine when it is only on one card, here are some results :
https://github.com/nktice/AMD-AI/blob/main/SallyAIRiddle.md
Multi-card loading only spits out gibberish, here's an example :
The text was updated successfully, but these errors were encountered: