Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLaMA-13B on AMD GPUs #166

Closed
Titaniumtown opened this issue Mar 5, 2023 · 29 comments
Closed

LLaMA-13B on AMD GPUs #166

Titaniumtown opened this issue Mar 5, 2023 · 29 comments

Comments

@Titaniumtown
Copy link

I have a 6900xt and I tried to load the LLaMA-13B model, I ended up getting this error:

Traceback (most recent call last):
  File "server.py", line 188, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/var/home/riley/text-generation-webui/modules/models.py", line 122, in load_model
    model = eval(command)
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 471, in from_pretrained
    return model_class.from_pretrained(
  File "/opt/conda/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2630, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/opt/conda/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2953, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/opt/conda/lib/python3.8/site-packages/transformers/modeling_utils.py", line 676, in _load_state_dict_into_meta_model
    set_module_8bit_tensor_to_device(model, param_name, param_device, value=param)
  File "/opt/conda/lib/python3.8/site-packages/transformers/utils/bitsandbytes.py", line 70, in set_module_8bit_tensor_to_device
    new_value = bnb.nn.Int8Params(new_value, requires_grad=False, has_fp16_weights=has_fp16_weights).to(device)
  File "/opt/conda/lib/python3.8/site-packages/bitsandbytes/nn/modules.py", line 196, in to
    return self.cuda(device)
  File "/opt/conda/lib/python3.8/site-packages/bitsandbytes/nn/modules.py", line 160, in cuda
    CB, CBt, SCB, SCBt, coo_tensorB = bnb.functional.double_quant(B)
  File "/opt/conda/lib/python3.8/site-packages/bitsandbytes/functional.py", line 1616, in double_quant
    row_stats, col_stats, nnz_row_ptr = get_colrow_absmax(
  File "/opt/conda/lib/python3.8/site-packages/bitsandbytes/functional.py", line 1505, in get_colrow_absmax
    lib.cget_col_row_stats(ptrA, ptrRowStats, ptrColStats, ptrNnzrows, ct.c_float(threshold), rows, cols)
  File "/opt/conda/lib/python3.8/ctypes/__init__.py", line 386, in __getattr__
    func = self.__getitem__(name)
  File "/opt/conda/lib/python3.8/ctypes/__init__.py", line 391, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /opt/conda/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

going into modules/models.py and setting "load_in_8bit" to False fixed it, but this should work by default.

@oobabooga
Copy link
Owner

How did you load LLaMA-13B into a 16GB GPU without 8-bit?

@Titaniumtown
Copy link
Author

How did you load LLaMA-13B into a 16GB GPU without 8-bit?

using --auto-devices

@oobabooga
Copy link
Owner

13b/20b models are loaded in 8-bit mode by default (when no flags are specified) because they are too large to fit in consumer GPUs.

--auto-devices disables this default behavior without the need for any manual changes to the code.

@Titaniumtown
Copy link
Author

Titaniumtown commented Mar 6, 2023

Fixed it, got 8-bit working, had to update bitsandbytes-rocm to use rocm 5.4.0 https://github.com/Titaniumtown/bitsandbytes-rocm/tree/patch-1 sent in a pull request. broncotc/bitsandbytes-rocm#4

Edit: seems that the 6900xt itself has issues with int8 which this fork (https://github.com/0cc4m/bitsandbytes-rocm/tree/rocm) seems to try and address, but it has it's own issues. Doing some investigation.

Edit 2: relates to this issue (bitsandbytes-foundation/bitsandbytes#165)

Edit 3: turns out it's something wrong with the generation settings? It only seems to fail when using the "NovelAI Sphinx Moth" preset among others.

@oobabooga
Copy link
Owner

Nice @Titaniumtown, thanks for the update.

@Titaniumtown
Copy link
Author

@oobabooga do you understand anything about what could be causing the generation issues? It seems to only be the case with specific combinations of generation settings.

@oobabooga
Copy link
Owner

What error appears when you use sphinx moth? This is a preset with high temperature and small top_k and top_p for creative but coherent outputs.

@Titaniumtown
Copy link
Author

  0%|                                                    | 0/26 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/gradio/routes.py", line 374, in run_predict
    output = await app.get_blocks().process_api(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/gradio/blocks.py", line 1017, in process_api
    result = await self.call_function(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/gradio/blocks.py", line 849, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/gradio/utils.py", line 453, in async_iteration
    return next(iterator)
  File "/var/home/riley/text-generation-webui/modules/text_generation.py", line 188, in generate_reply
    output = eval(f"shared.model.generate({', '.join(generate_params)}){cuda}")[0]
  File "<string>", line 1, in <module>
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/transformers/generation/utils.py", line 1452, in generate
    return self.sample(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/transformers/generation/utils.py", line 2504, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

@oobabooga
Copy link
Owner

I get this error when I try to use 8-bit mode in my GTX 1650. It's an upstream issue in the bitsandbytes library, as you found.

@Titaniumtown
Copy link
Author

Ah, so there's nothing I can do about it. Sad. Thanks!

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Mar 10, 2023

Change the 8bit threshold. It will probably help on AMD as well. I cannot test because my old card doesn't work with rocm due to AGP 2.0. It only works in windows.

@Titaniumtown
Copy link
Author

@Ph0rk0z I just use 4bit models now. Works like a dream and has much better performance.

@ttio2tech
Copy link

@Titaniumtown can you share how to use 4bit model for AMD GPU? I was looking at https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model, but Step 1: Installation for GPTQ-for-LLaMa requires CUDA?

@Titaniumtown
Copy link
Author

@Titaniumtown can you share how to use 4bit model for AMD GPU? I was looking at https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model, but Step 1: Installation for GPTQ-for-LLaMa requires CUDA?

It does not require cuda. rocm works just fine. I just ran the script like Nvidia users do and it worked perfectly.

@ttio2tech
Copy link

@Titaniumtown can you share how to use 4bit model for AMD GPU? I was looking at https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model, but Step 1: Installation for GPTQ-for-LLaMa requires CUDA?

It does not require cuda. rocm works just fine. I just ran the script like Nvidia users do and it worked perfectly.

Thank you! I will give it a try

@atanasopulo
Copy link

@Titaniumtown can you share how to use 4bit model for AMD GPU? I was looking at https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model, but Step 1: Installation for GPTQ-for-LLaMa requires CUDA?

It does not require cuda. rocm works just fine. I just ran the script like Nvidia users do and it worked perfectly.

@Titaniumtown I tried to set things up and run just like the guide explains. I mean, as you said to just run the script like an Nvidia user would. And I get errors about missing headers when running "python setup_cuda.py install": #487

Could you help me? Am I missing something important?
I'm new to all this stuff btw. I'm sure I am not understanding, or missing, something.

@Titaniumtown
Copy link
Author

@vivaperon do you have cuda installed?

@viliger2
Copy link

viliger2 commented Mar 26, 2023

Getting these errors when trying to to compile GPTQ-for-LLaMA

home/viliger/text-generation-webui/repositories/GPTQ-for-LLaMa/quant_hip_kernel.hip:653:10: error: use of overloaded operator '=' is ambiguous (with operand types 'half2' (aka '__half2') and 'void')
    res2 = {};
/home/viliger/text-generation-webui/repositories/GPTQ-for-LLaMa/quant_hip_kernel.hip:665:12: error: no matching function for call to '__half2float'
    res += __half2float(res2.x) + __half2float(res2.y);

8bit model runs fine, once I got bitsandbytes-rocm installed. Also attached full log of compilation,
output.txt

@arctic-marmoset
Copy link

arctic-marmoset commented Mar 27, 2023

@viliger2 @vivaperon this seems to be caused by GPTQ-for-LLaMA commits after 841feed using fp16 types. HIP doesn't seem to handle some implicit casts as far as I can tell. Rolling back to that commit results in successful compilation.

@atanasopulo
Copy link

@vivaperon do you have cuda installed?

Yes.

@atanasopulo
Copy link

@viliger2 @vivaperon this seems to be caused by GPTQ-for-LLaMA commits after 841feed using fp16 types. HIP doesn't seem to handle some implicit casts as far as I can tell. Rolling back to that commit results in successful compilation.

Thanks a lot! I will try this today when I get hom from work, and let you guys know.

Btw, these are my PC specs:
Xeon E5-2620v2, 16GB ECC DDR3 RAM, AMD RX6600 8GB.

@arctic-marmoset
Copy link

@viliger2 @vivaperon I had a chance to take another look the issue and I now have the latest version of GPTQ-for-LLaMA working with HIP. If you're interested, I posted my findings here. I also have a fork of the repo with my changes here.

@Titaniumtown
Copy link
Author

@arctic-marmoset Thanks!

@atanasopulo
Copy link

@arctic-marmoset wow, thanks a lot!! Will try your fork today! At last I can put my 6600 to do useful work lol

@atanasopulo
Copy link

I also have a fork of the repo with my changes here.

I tried your repo and got this error:

No ROCm runtime is found, using ROCM_HOME='/opt/rocm-5.4.3'
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
running install
/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
writing quant_cuda.egg-info/PKG-INFO
writing dependency_links to quant_cuda.egg-info/dependency_links.txt
writing top-level names to quant_cuda.egg-info/top_level.txt
/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'quant_cuda.egg-info/SOURCES.txt'
writing manifest file 'quant_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'quant_cuda' extension
gcc -pthread -B /home/christopher/miniconda3/envs/gptq/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /home/christopher/miniconda3/envs/gptq/include -I/home/christopher/miniconda3/envs/gptq/include -fPIC -O2 -isystem /home/christopher/miniconda3/envs/gptq/include -fPIC -I/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/include -I/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/include/TH -I/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/christopher/miniconda3/envs/gptq/include/python3.9 -c quant_cuda.cpp -o build/temp.linux-x86_64-cpython-39/quant_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/cuda/init.py:546: UserWarning: Can't initialize NVML
warnings.warn("Can't initialize NVML")
Traceback (most recent call last):
File "/home/christopher/GPTQ-for-LLaMA-fork-amd/GPTQ-for-LLaMa-hip/setup_cuda.py", line 12, in
setup(
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/init.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/install.py", line 74, in run
self.do_egg_install()
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/install.py", line 123, in do_egg_install
self.run_command('bdist_egg')
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 165, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 151, in call_command
self.run_command(cmdname)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/command/install_lib.py", line 112, in build
self.run_command('build_ext')
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
super().run_command(command)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
self.build_extensions()
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
build_ext.build_extensions(self)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 468, in build_extensions
self._build_extensions_serial()
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 494, in _build_extensions_serial
self.build_extension(ext)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 549, in build_extension
objects = self.compiler.compile(
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/setuptools/_distutils/ccompiler.py", line 599, in compile
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 581, in unix_wrap_single_compile
cflags = unix_cuda_flags(cflags)
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 548, in unix_cuda_flags
cflags + _get_cuda_arch_flags(cflags))
File "/home/christopher/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1773, in _get_cuda_arch_flags
arch_list[-1] += '+PTX'
IndexError: list index out of range

@viliger2
Copy link

viliger2 commented Mar 30, 2023

@vivaperon it seems that you have issues with your ROCm installation, check if you have it installed or have version different from 5.4.3

@belqit
Copy link

belqit commented Apr 4, 2023

(textgen) root@gribaai:~/text-generation-webui# python server.py --model llama-13b-4bit-128g --wbits 4 --groupsize 128

CUDA SETUP: Loading binary /root/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libsbitsandbytes_cpu.so...
Loading llama-13b-4bit-128g...
CUDA extension not installed.
Traceback (most recent call last):
  File "/root/text-generation-webui/server.py", line 276, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/root/text-generation-webui/modules/models.py", line 102, in load_model
    model = load_quantized(model_name)
  File "/root/text-generation-webui/modules/GPTQ_loader.py", line 114, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
  File "/root/text-generation-webui/modules/GPTQ_loader.py", line 36, in _load_quant
    make_quant(model, layers, wbits, groupsize, faster=faster_kernel, kernel_switch_threshold=kernel_switch_threshold)
TypeError: make_quant() got an unexpected keyword argument 'faster'
(textgen) root@gribaai:~/text-generation-webui#

how do I install CUDA extension with amd gpu?

if i do this "python setup_cuda.py install" inside "GPTQ-for-LLaMa" folder it returns me this error:

............
File "/root/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1780, in _get_cuda_arch_flags
    arch_list[-1] += '+PTX'
IndexError: list index out of range

@arctic-marmoset
Copy link

@belqit Please see @viliger2's comment above. You'll need to install ROCm 5.4.3.

@belqit
Copy link

belqit commented Apr 4, 2023

@belqit Please see @viliger2's comment above. You'll need to install ROCm 5.4.3.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants