Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental install guide for ROCm #1550

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

xzuyn
Copy link
Contributor

@xzuyn xzuyn commented Apr 19, 2024

Description

This adds a guide on how to install Axolotl for ROCm users.

Currently you need to install the packages included in pip install -e '.[deepspeed]' then uninstall torch, xformers, and bitsandbytes, so you can then install the ROCm versions of torch and bitsandbytes. The process is a definitely janky, since you install stuff you don't want just to uninstall it afterwards.

Installing the ROCm version of torch first to try to skip a step results in Axolotl failing to install, so the order this is in is necessary without changes to the readme.txt or setup.py.

Improvements could be made to this setup by preventing torch, bitsandbytes, and xformers from being installed by modifying setup.py to include [amd] and [nvidia] options. That way we would skip the install-then-uninstall step done before installing the packages required.

Motivation and Context

I still see people on places like Reddit asking if it's possible yet to train AI stuff using AMD hardware. I want more people to know it's possible, although still experimental.

How has this been tested?

Using my personal system; Ubuntu 22.04.4, using ROCm 6.1 with an RX 7900 XTX. I've been using Axolotl (and other PyTorch based AI tools like kohya_ss) this way for months on various version of PyTorch and ROCm without major issues.

The only time I've had issues was when ROCm 6.0.2 released and caused training to only output 0 loss after I upgraded. This might've just been an issue with how I upgraded.

Screenshots (if appropriate)

Types of changes

This only adds additions to the README.md file.

Social Handles (Optional)

@winglian
Copy link
Collaborator

thanks for this @xzuyn would it be helpful if I handled this in the docker images for you? Do you use the docker images?

@ehartford does this line up with the AMD work that you've been doing?

@ehartford
Copy link
Collaborator

I'm happy to test it

@xzuyn
Copy link
Contributor Author

xzuyn commented Apr 19, 2024

would it be helpful if I handled this in the docker images for you? Do you use the docker images?

The only time I use docker is with runpod, but then I'm using an NVIDIA GPU. Even though the setup is a little janky, the ROCm setup is fairly straightforward for me to do in a venv.

`main` branch is `0.41.3.post1`, so using `rocm` branch brings us to `0.42.0`
@xzuyn
Copy link
Contributor Author

xzuyn commented Apr 20, 2024

I updated the readme to use the rocm branch instead of the main branch as that seems to be newer.

Although arlo-phoenix now recommends using the official ROCm rocm_enabled branch of bitsandbytes instead of his fork. It's more up to date (26 commits behind & 0.44.0.dev0 vs. 140 commits behind & 0.42.0) and initially looks to install without issue, but when running Axolotl I get an error.

Could not find the bitsandbytes CUDA binary at PosixPath('/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_hip_nohipblaslt.so')
Could not load bitsandbytes native library: /media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 122, in <module>
    lib = get_native_library()
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 104, in get_native_library
    dll = ct.cdll.LoadLibrary(str(binary_path))
  File "/usr/lib/python3.10/ctypes/__init__.py", line 452, in LoadLibrary
    return self._dlltype(name)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory

So for now this is the latest I can confirm works for me.

@ehartford
Copy link
Collaborator

I go the other way around
I modify the requirements.txt so it doesn't install torch, bitsandbytes, xformers, flash attention, triton, deepspeed, etc.
then I install those manually myself.

@lizamd
Copy link

lizamd commented May 2, 2024

hi @xzuyn Thanks for the effort, I have been trying this on rocm and no luck yet. can we connect internally AMD? is it ok to put contact information on your profile? my email is: [email protected]

Without this you get `NameError: name 'amdsmi' is not defined`
@winglian
Copy link
Collaborator

winglian commented Dec 5, 2024

looks like it should work out of the box according to our docs https://github.com/axolotl-ai-cloud/axolotl/blob/main/docs/amd_hpc.qmd

@hsmallbone
Copy link

hsmallbone commented Jan 9, 2025

I have gotten axolotl working with DeepSpeed on ROCm (MI250X). Can assist with the install guide (the guide in production currently doesn't mention the bitsandbytes requirement and I believe flash-attn can now be directly pip installed). ROCm 6.1+ is essentially mandatory for most packages.

https://huggingface.co/docs/bitsandbytes/main/en/installation?platform=ROCm#installation

pip install --no-deps --force-reinstall 'https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_multi-backend-refactor/bitsandbytes-0.44.1.dev0-py3-none-manylinux_2_24_x86_64.whl'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants