HIPBLASLT error, and the work around for AMD/ROCM users who are getting it #1108

lemon07r · 2024-06-21T12:13:32Z

lemon07r
Jun 21, 2024

Hello everyone, I was asked to share my work around for an error I've been struggling with for a few days now. Huge thanks to @pbontrager , @RdoubleA and @ebsmothers from the torchtune discord for helping me figure things out, and @YellowRoseCx from the koboldai discord for figuring out this fix.

The error is as follows:

rocblaslt warning: No paths matched /home/lamim/pytorch/pytorch/.venv/lib64/python3.11/site-packages/torch/lib/hipblaslt/library/*gfx1030*co. Make sure that HIPBLASLT_TENSILE_LIBPATH is set correctly.

And will end with:

RuntimeError: CUDA error: HIPBLAS_STATUS_NOT_SUPPORTED when calling HIPBLAS_STATUS_NOT_SUPPORTED

After a lot of back and forth, reinstalling ROCM 6.0 from the official fedora 40 repos, then 6.1 from the AMD ROCM repos, I could not get around this issue. However yellowrose was able to pinpoint the relevant issues: pytorch/pytorch#119081 (comment) and pytorch/pytorch#128753

They noted that

HipBLAS LT is only supported on like two super expensive AMD cards and it's not using rocBLAS like it should

There looks like there is a fix being worked on, in ROCM'/pytorch so we were going to try and build it from that repo, but yellowrosecx offered an alternative workaround that has worked for me. We run the two following commands below:

TORCH_BLAS_PREFER_HIPBLASLT=0
then
export TORCH_BLAS_PREFER_HIPBLASLT

This has worked for me (on Fedora 40). I'm now able to train models on datasets, and finish the process (tested on a 6900 XT doing lora training on a phi mini). I don't know if there are any other issues yet, or if it even trained properly, but I will update this post as I do more testing. I am using torchtune built from this repo on 6/19/2024, and rocm 6.1.2 installed using the official AMD documentation for RHEL 9.4.

lhl · 2024-06-23T10:33:51Z

lhl
Jun 23, 2024

I've been traveling so just seeing this but there are two issues/discussions I filed that I'll link to:

So the hipblaslt-dev included with ROCm 6.0/6.1.2 package does not include a gfx1100 kernel. But you can compile your own version, and it 100% works w/ gfx1100. However the second issue is that the ROCm 6.1 nightly pytorch packages seems to include their own version of hipblaslt which you need to replace with your own version as well (I had to step through gdb to see what was going on with that).

EDIT: as of 2024-06-23 these are the supported architecture (from the CMakeLists.txt):

TARGETS "gfx90a:xnack+;gfx90a:xnack-;gfx940;gfx941;gfx942;gfx1100;gfx1101"

I did spot the TORCH_BLAS_PREFER_HIPBLASLT=0 workaround searching for PyTorch HIPBLAS bugs afterwards, which is probably the easier route (EDIT: and the only route for gfx1030) and I guess what I'd recommend for people to use, but I didn't check to see what the performance difference was.

0 replies

TLA020 · 2024-10-30T20:51:55Z

TLA020
Oct 30, 2024

Even with TORCH_BLAS_PREFER_HIPBLASLT=0 i still get this error....

ComfyUI Startup Log (Linux)

ComfyUI successfully installed dependencies and started up at 2024-10-30 21:45:33.875633.

Platform Details:

Python version: 3.12.3
ComfyUI Path: /Documents/Artifical/ComfyUI
Log path: /Documents/Artifical/ComfyUI/comfyui.log
Total VRAM: 16,368 MB
Total RAM: 64,243 MB
PyTorch version: 2.5.0+rocm6.1

VRAM Settings

VRAM state set to NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 6900 XT

Cross Attention: Using sub quadratic optimization for cross-attention. For issues, try --use-split-cross-attention.

Loaded Custom Nodes and Modules:

Loaded custom nodes such as ComfyUI-Easy-Use, ComfyUI-Inspire-Pack, and Crystools.
Node suite WAS Node Suite: FFMPEG support enabled but ffmpeg_bin_path not set in was_suite_config.json. Defaults to system ffmpeg binaries if available.

Sample Generation:

Using model type: FLUX
Sampling using heun sampler with 20 steps, guidance: 3.5.

Errors & Warnings

Highlighted Warnings:

Crystools Warning: "No GPU with CUDA detected" - Could not initialize pynvml (Nvidia). This may affect CUDA-based processes.
rocblaslt Warning: "No paths matched for hipblaslt library" - Check HIPBLASLT_TENSILE_LIBPATH.
Tunable Operation Validator Warning: key="PT_VERSION" is missing for validation. Validator check failed.
hipBLASLt Warning: "Attempting to use hipBLASLt on an unsupported architecture" - Overriding backend to hipblas. This warning may be triggered internally.
Artifical/ComfyUI/comfyui_env/lib/python3.12/site-packages/torch/lib/hipblaslt/library/gfx1030co. Make sure that HIPBLASLT_TENSILE_LIBPATH is set correctly.

2 replies

pbontrager Nov 18, 2024
Collaborator

These logs don't look to be torchtune related, if it's an issue with pytorch as a whole you should try the pytorch/pytorch github. You should also test on pytorch 2.5.1 now.

Wintoplay Nov 28, 2024

I use 2.5.1 with rocm 6.2 and TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1

I get the warning
Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIPBLASLT error, and the work around for AMD/ROCM users who are getting it #1108

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

HIPBLASLT error, and the work around for AMD/ROCM users who are getting it #1108

lemon07r Jun 21, 2024

Replies: 2 comments · 2 replies

lhl Jun 23, 2024

TLA020 Oct 30, 2024

ComfyUI Startup Log (Linux)

Platform Details:

VRAM Settings

Loaded Custom Nodes and Modules:

Sample Generation:

Errors & Warnings

pbontrager Nov 18, 2024 Collaborator

Wintoplay Nov 28, 2024

lemon07r
Jun 21, 2024

Replies: 2 comments 2 replies

lhl
Jun 23, 2024

TLA020
Oct 30, 2024

pbontrager Nov 18, 2024
Collaborator