-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLaMA-13B on AMD GPUs #166
Comments
How did you load LLaMA-13B into a 16GB GPU without 8-bit? |
using |
13b/20b models are loaded in 8-bit mode by default (when no flags are specified) because they are too large to fit in consumer GPUs.
|
Fixed it, got 8-bit working, had to update bitsandbytes-rocm to use rocm 5.4.0 https://github.com/Titaniumtown/bitsandbytes-rocm/tree/patch-1 sent in a pull request. broncotc/bitsandbytes-rocm#4 Edit: seems that the 6900xt itself has issues with int8 which this fork (https://github.com/0cc4m/bitsandbytes-rocm/tree/rocm) seems to try and address, but it has it's own issues. Doing some investigation. Edit 2: relates to this issue (bitsandbytes-foundation/bitsandbytes#165) Edit 3: turns out it's something wrong with the generation settings? It only seems to fail when using the "NovelAI Sphinx Moth" preset among others. |
Nice @Titaniumtown, thanks for the update. |
@oobabooga do you understand anything about what could be causing the generation issues? It seems to only be the case with specific combinations of generation settings. |
What error appears when you use sphinx moth? This is a preset with high temperature and small top_k and top_p for creative but coherent outputs. |
|
I get this error when I try to use 8-bit mode in my GTX 1650. It's an upstream issue in the bitsandbytes library, as you found. |
Ah, so there's nothing I can do about it. Sad. Thanks! |
Change the 8bit threshold. It will probably help on AMD as well. I cannot test because my old card doesn't work with rocm due to AGP 2.0. It only works in windows. |
@Ph0rk0z I just use 4bit models now. Works like a dream and has much better performance. |
@Titaniumtown can you share how to use 4bit model for AMD GPU? I was looking at https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model, but Step 1: Installation for GPTQ-for-LLaMa requires CUDA? |
It does not require cuda. rocm works just fine. I just ran the script like Nvidia users do and it worked perfectly. |
Thank you! I will give it a try |
@Titaniumtown I tried to set things up and run just like the guide explains. I mean, as you said to just run the script like an Nvidia user would. And I get errors about missing headers when running "python setup_cuda.py install": #487 Could you help me? Am I missing something important? |
@vivaperon do you have cuda installed? |
Getting these errors when trying to to compile GPTQ-for-LLaMA
8bit model runs fine, once I got bitsandbytes-rocm installed. Also attached full log of compilation, |
@viliger2 @vivaperon this seems to be caused by GPTQ-for-LLaMA commits after 841feed using fp16 types. HIP doesn't seem to handle some implicit casts as far as I can tell. Rolling back to that commit results in successful compilation. |
Yes. |
Thanks a lot! I will try this today when I get hom from work, and let you guys know. Btw, these are my PC specs: |
@viliger2 @vivaperon I had a chance to take another look the issue and I now have the latest version of GPTQ-for-LLaMA working with HIP. If you're interested, I posted my findings here. I also have a fork of the repo with my changes here. |
@arctic-marmoset Thanks! |
@arctic-marmoset wow, thanks a lot!! Will try your fork today! At last I can put my 6600 to do useful work lol |
I tried your repo and got this error: No ROCm runtime is found, using ROCM_HOME='/opt/rocm-5.4.3' |
@vivaperon it seems that you have issues with your ROCm installation, check if you have it installed or have version different from 5.4.3 |
how do I install CUDA extension with amd gpu? if i do this "python setup_cuda.py install" inside "GPTQ-for-LLaMa" folder it returns me this error:
|
I have a 6900xt and I tried to load the LLaMA-13B model, I ended up getting this error:
going into
modules/models.py
and setting "load_in_8bit" to False fixed it, but this should work by default.The text was updated successfully, but these errors were encountered: