-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: rocBLAS error: Cannot read TensileLibrary.dat: No such file or directory #1936
Comments
Hi @slipperyslipped. Your GPU uses the gfx1031 instruction set, but the binaries distributed by AMD are not built for that architecture as it is not officially supported. However, the gfx1030 instruction set is identical to the gfx1031 instruction set in all but name. For this reason, there are ways to get the existing binaries running on your GPU. As a workaround, I would recommend setting the environment variable |
Hi, I was blocked by the same problem I am using a gfx 1013 device. Can I set pytorch not to use rocBlas for this ? |
I'm not an expert on PyTorch, but the gfx1013 ISA is a superset of the gfx1010 ISA. You can set |
@cgmb gfx1010 produces the same issue: $ drun --rm rocm/dev-ubuntu-22.04:5.6-complete
root@ftl:/# ls -1 /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx*
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx803.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat As you see there is no |
Thanks @ulyssesrr. That's a great analysis of the problem. It's perhaps worth noting that the OS-provided rocBLAS package on Debian 13 (Testing/Trixie) and the upcoming Ubuntu 23.10 (Mantic Minotaur) builds Tensile with The OS-provided package for rocBLAS on Debian/Ubuntu also automatically handles loading code objects for ISAs that are known to be compatible as I'd suggested earlier in this thread. For this reason, the OS-provided package has much wider hardware compatibility than the AMD-provided package on GFX9 and GFX10 hardware. I have not tested the OS-provided packages on all hardware platforms, but the tests are also packaged in the OS package Just mentioning it, since that's probably a useful workaround for some people on hardware that is not officially supported. Even folks on other operating systems could potentially spin up a docker container with an Ubuntu or Debian image and |
@cgmb I forgot to mention that the rocBLAS build script on 5.6.0 seems to have an issue where The rmake.py script treats the cmake flags However I was getting them enabled by default, thus I had to actually opt-out, I'm guessing it is being done here: I didn't debug much, just rolled a patch and went my way(Which I ended not needing as I patched the Tensile issue): As I didn't debug much, I didn't feel confident to open an Issue. |
FYI seeing what seems to be the same GPU is a 7800 XT. Stack from running a basic PyTorch example under GDB is shown below. I did have to override gfx version to either
|
Did you only install the Radeon Software or did you also install ROCm? |
@YellowRoseCx Yes, rocm was installed. But there were some errors and perhaps there is a version mismatch. I have since reinstalled the whole machine and here is the current state:
Same segfault and stack looks similar. Here is a basic log of what I tried this time:
This time I opted for the AMDGPU install flow option in the ROCm install guide. Running the installer from the Note PyTorch repo is |
The RX 7800 XT (Navi 32) is gfx1101. You likely were overriding the gfx version to 11.0.0. However, that is not safe. The gfx1100 ISA has more registers than the gfx1101 ISA and there are other important differences in the ABI too. With Navi 21/22/23/24, the gfx version override approach more or less worked, despite not being officially supported. Users execute code built for Navi 21 on any of those chips and I don't know of any problems encountered from doing so. The compiler handled each of those ISAs identically. Navi 31/32/33 are not like that. There are known differences between those chips that the compiler is accounting for when it generates code for each architecture. (This isn't the cause of the specific TensileLibrary.dat error you encountered, but it's a warning that you may encounter other problems even once the Tensile issue is resolved, if you're using that override.) |
@cgmb Thanks for the ISA incompatibility heads up for Navi 31/32/33. Good to know. I actually had just started going through the RDNA 3 ISA doc, but did not notice any chip-specific differences called out so far. Is there other documentation I should review, or will there eventually be updates to highlight differences? Since this is off-topic for this issue, is there a better place to follow (or open) an issue wrt to documentation? |
JFYI I got working stable diffusion automatic with rocm 5.7 working on Phoenix APU (7840u) via setting it to 11.0.0
Without this override I got
|
For other arch such as gfx1103, I think the right way to use it is to generate a new TensileLibrary.dat file to get optimal performance. Do we have a way to trigger this process? |
@hiepxanh sure I will push to see if it can get reviewed sooner rather than later. |
Tried to get my 6650 XT to work with llama.ccp by installing rocm-hip-sdk and got the same error after I think it failed to properly build on first launch:
Launching through the gpu again just gives me the last error now. |
@NaturalHate, build for gfx1030 and run with |
If i have to build it myself then I guess I'll pass. |
No i can send you if you use rx6600 there is a lot of people already build it. Just copy pate and it run |
I don't. I use a 6650 XT. |
@hiepxanh Hey taking my moment to thank you:) I use rx6600 XT and the environment variable saved me! @NaturalHate I'm not expert on those hardware stuff but from your error message the architecture is |
@NaturalHate LostRuins/koboldcpp#441 He gave me this file on koboldcpp, it work, you can try it since it the same 1032 platform. @wayneyaoo you are welcome, I digging a lot and I think I should save others time, this issue is really frustrated |
Thanks again for bringing this issue to our attention. We noticed that there hasn't been any activity on this issue for a while. To keep our issue tracker clean and focused on active matters, we will be closing this issue if there is no further activity within the next week. If you still require assistance or believe this issue needs to remain open, please provide any additional information or updates at your earliest convenience. I suggest to open related issues in ROCm/ROCm as it should be directed toward general hardware compatibility and support. Thank you for your understanding and cooperation. |
@mahmoodw, thank for keeping an eye on stale issues. I think, this one is just waiting for ROCm 6.2. But that doesn't mean that issue no longer exists. On NixOS 23.11 I was able to use work-around that |
could you please help me?My GPU is 6600 ! |
Having an identical issue with my RX 5700XT |
my card is RX6600 using koboldcpp ROCm https://github.com/YellowRoseCx/koboldcpp-rocm worked. If you want to have tensileLibrary, you can copy the zip file @younijia you can use the same method too, since RX 6600XT is using same arch @Extocine |
To add on to @mahmoodw 's statement, I'd like to request that those of you who are experiencing a similar issue to the original reporter to please open a new issue, rather than adding onto this one. There are too many individual commenters reporting similar issues just on this ticket, with a number of varying workloads and unsupported/supported hardware. We will be happy to help you resolve the issues you're encountering, but managing several threads of conversation to solve what might be multiple unrelated issues is untenable here. I will be closing this one. Thank you for your understanding. |
Describe the bug
Basically getting some form of this error, either
rocBLAS error: Cannot read /opt/rocm-5.4.0/lib/rocblas/library/TensileLibrary.dat: Illegal seek
orCannot read TensileLibrary.dat: No such file or directory
To Reproduce
rocblas-dev (= 2.46.0.50400-72~22.04)
Steps to reproduce the behavior:
Expected behavior
No error?
Log-files
Environment
Make sure that ROCm is correctly installed and to capture detailed environment information run the following command:
Getting this error: ```No LSB modules are available.
The text was updated successfully, but these errors were encountered: