Replies: 2 comments 2 replies
-
I've been traveling so just seeing this but there are two issues/discussions I filed that I'll link to:
So the EDIT: as of 2024-06-23 these are the supported architecture (from the CMakeLists.txt):
I did spot the |
Beta Was this translation helpful? Give feedback.
-
Even with TORCH_BLAS_PREFER_HIPBLASLT=0 i still get this error.... ComfyUI Startup Log (Linux)ComfyUI successfully installed dependencies and started up at Platform Details:
VRAM Settings
Cross Attention: Using sub quadratic optimization for cross-attention. For issues, try Loaded Custom Nodes and Modules:
Sample Generation:
Errors & WarningsHighlighted Warnings:
|
Beta Was this translation helpful? Give feedback.
-
Hello everyone, I was asked to share my work around for an error I've been struggling with for a few days now. Huge thanks to @pbontrager , @RdoubleA and @ebsmothers from the torchtune discord for helping me figure things out, and @YellowRoseCx from the koboldai discord for figuring out this fix.
The error is as follows:
rocblaslt warning: No paths matched /home/lamim/pytorch/pytorch/.venv/lib64/python3.11/site-packages/torch/lib/hipblaslt/library/*gfx1030*co. Make sure that HIPBLASLT_TENSILE_LIBPATH is set correctly.
And will end with:
RuntimeError: CUDA error: HIPBLAS_STATUS_NOT_SUPPORTED when calling HIPBLAS_STATUS_NOT_SUPPORTED
After a lot of back and forth, reinstalling ROCM 6.0 from the official fedora 40 repos, then 6.1 from the AMD ROCM repos, I could not get around this issue. However yellowrose was able to pinpoint the relevant issues: pytorch/pytorch#119081 (comment) and pytorch/pytorch#128753
They noted that
There looks like there is a fix being worked on, in ROCM'/pytorch so we were going to try and build it from that repo, but yellowrosecx offered an alternative workaround that has worked for me. We run the two following commands below:
TORCH_BLAS_PREFER_HIPBLASLT=0
then
export TORCH_BLAS_PREFER_HIPBLASLT
This has worked for me (on Fedora 40). I'm now able to train models on datasets, and finish the process (tested on a 6900 XT doing lora training on a phi mini). I don't know if there are any other issues yet, or if it even trained properly, but I will update this post as I do more testing. I am using torchtune built from this repo on 6/19/2024, and rocm 6.1.2 installed using the official AMD documentation for RHEL 9.4.
Beta Was this translation helpful? Give feedback.
All reactions