-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
6700XT/6800M Gfx1031 libraries for compilation of Kobold.cpp #441
Comments
Just pinging @YellowRoseCx as they might find it useful to know. |
it also works when done the same for rx6600 with gfx1032, speaks gibberish too (probably easy to fix) |
Following this https://www.reddit.com/r/LocalLLaMA/comments/16d1hi0/comment/jzmvbfc/?context=3 from step 8 and then copy the libraries or use the make_pyinst_rocm_hybrid_henk_yellow.bat works for me with 6750XT. Without mmq it gives 6ms/token procesed and 40ms/token generated with all layers in the gpu for 13b model using YellowRose fork. |
when running cmake --install build\release --prefix "C:\Program Files\AMD\ROCm\5.5" i get: CMake Error: Error processing file: C:/Windows/System32/rocBLAS/build/release/cmake_install.cmake |
Are you CDing to the rocBlas folder before running that? |
You need to run it in x64 native as admin INSIDE the rocBLAS directory |
yes im in C:/Windows/System32/rocBLAS is that right? |
git clone https://github.com/ROCmSoftwarePlatform/rocBLAS cd rocBLAS Inside that directory, where you clone the rocBLAS repo. |
yes thats exactly what i did |
But at first yo clone the repos just in plain cmd, everytime you open a x64 native tools command prompt you need to go to the original directory where you clone rocBLAS repo the first time. Don't do that in system32, don't do anything in system32. |
i dont know if i cant explain it or i am just dumb, but i cloned the repo in cmd, the repo cloned into system32 directory because its default, and everytime i open x64 native tools command prompt i cd into rocblas folder that i cloned |
Don't do it in system32, do it in c: or in other dir, never do anything in system32. Repeat all the steps cloning repos in c: or wherever you want, but never inside OS dirs. |
well now im getting: |
Check all the steps from here https://www.reddit.com/r/LocalLLaMA/comments/16d1hi0/guide_build_llamacpp_on_windows_with_amd_gpus_and/ and set the system enviroment variables. |
I've been trying all day, I think there's some GPUs that just don't have proper windows support yet. I've got the gfx1031 to compile when that's the only GPU arch, but as soon as I compile it with others like the gfx1030, it builds successfully But my outputs become garbled nonsense and are wrong. I even tried building them separately then adding the files together to the same folder and it didn't work |
Mine is gfx1031, 6750xt and is working like a charm, compiled myself your repo and then compile rocBLAS and Tensile with the patch, but rocBLAS and Tensile just for gfx1031. I think you don't need to compile rocBLAS and Tensile for gpus that are supported like 1030, just for unsupported gpus. |
So when you built the library for gfx1031, you dragged those files into the rocblas folder inside the koboldcpp-rocm folder? When it asked you if you wanted to replace files, did you? Because when I replace those files is when it breaks support for me. Files like |
Yes, all files in C:\Program Files\AMD\ROCm\5.5\bin\rocblas\library inside your repo. Just check you don't have lazy and non-lazy together. Here ggerganov#1087 (comment) explain how to create non lazy, if you use non-lazy you need to remove TensileLibrary_lazy_gfx1031.dat. If you want i can pack the files compiled and send it to you. Update: I tried to compile for gfx1032 and I can't, it looks like you can't compile for a gpu you don't have. |
Hi, Sorry I was being a bit sick in the past few days. Here are the lazy and non-lazy versions of the libraries (might've gotten the names swapped) @YellowRoseCx @Drake-AI It sucks that you aren't able to build it for something that you don't have! But like you said, for compilation, you need to have either one in the folder, not both. Maybe we can crowd-source the Tensile libraries in a repo? That way people can pull them down whilst they are compiling or even use with other applications, given that ROCm can run on windows? |
I have a RX6600 and some ability to try and compile those tensile libs, after previously trying to get the WIP MIOpen Windows Port compiled. (Which was absolutely a struggle to even get partway there) |
how did you make that non-lazy version? When I followed that guy's instructions, it enables lazy-library loading and I always got "lazy" files These are what my results were:
for some reason I've only been able to generate lazy files, even with modifying rmake.py to disable lazy-library-loading and when I make multiple kernels at the same time, like: |
You need to cd into the rocblas folder where you're rdeps and rmake files
are, in a x64 Native Tools command prompt as admin and then do:
.\build\release\virtualenv\Scripts\activate.bat
TensileCreateLibrary --architecture YOUR_GPU_ARCHS --code-object-version
default --merge-files --library-format msgpack
.\library\src\blas3\Tensile\Logic\asm_full C:\SomeOutputFolder HIP
You're going to change the GPU arch and SomeOutputFolder per your need.
…On Tue, Sep 19, 2023, 6:17 PM YellowRoseCx ***@***.***> wrote:
Hi,
Sorry I was being a bit sick in the past few days.
Here are the lazy and non-lazy versions of the libraries (might've gotten
the names swapped) @YellowRoseCx <https://github.com/YellowRoseCx>
lazy_gfx1031.zip
<https://github.com/LostRuins/koboldcpp/files/12663973/lazy_gfx1031.zip>
non_lazy_gfx1031.zip
<https://github.com/LostRuins/koboldcpp/files/12663974/non_lazy_gfx1031.zip>
@Drake-AI <https://github.com/Drake-AI> It sucks that you aren't able to
build it for something that you don't have! But like you said, for
compilation, you need to have either one in the folder, not both.
Maybe we can crowd-source the Tensile libraries in a repo? That way people
can pull them down whilst they are compiling or even use with other
applications, given that ROCm can run on windows?
how did you make that non-lazy version? When I followed that guy's
instructions, it enables lazy-library loading and I always got "lazy" files
These are what my results were:
gfx1032;gfx1033;gfx1034;gfx1035 = Compile Errors
gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1011;gfx1012;gfx1030;gfx1031;gfx1100;gfx1101;gfx1102
= Built but not working altogether
for some reason I've only been able to generate lazy files, even with
modifying rmake.py to disable lazy-library-loading
—
Reply to this email directly, view it on GitHub
<#441 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACFMFNYJZUHAU7S53OBEZO3X3IKWVANCNFSM6AAAAAA43MJYQM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I did that and have tried it with lazy and non lazy files. As soon as I try to build kernels for more than 1 GPU, even if it's something like GFX906 and GFX1030 that come with Windows ROCm, my output becomes garbled on a 6800xt If one of you can try building for multiple kernels you might see what I mean I have a GFX1030, GFX900, and GFX1010 I can test with |
If I download the current build here and compile it with make LLAMA_HIPBLAS=1 -j4, then I also get garbled output on my RX6650 XT, but with Linux. Sometimes the EOS triggers right at the beginning and it doesn't output anything at all, sometimes it fills the range of the specified max tokens and spams me with one word. Something seems to be broken since the last builds of llama.cpp, at least for me. Strangely enough, when I use your rocm build it still works, older llama versions also work, which I find strange. So this doesn't just seem to affect Windows. Never had any problems with it under Linux before. Also with ROCm 5.7.0, exactly the same issue. |
try using 2 or 3 less layers than the maximum the model has |
I did with my laptop's integrated graphics a while ago (it was gfx90c), but ROCm doesn't support APUs and I wasn't able to compile for others. |
Works great on This is how I did it:
I didn't have to replace any files in the rocblas\library folder. The files added were missing. @YellowRoseCx Would it be possible to have the .exe build with these files added? Thanks all 🙌 |
see if this works for you: https://github.com/YellowRoseCx/koboldcpp-rocm/releases/tag/v1.56.yr1-ROCm I'm gonna have to test it myself to make sure adding those didnt mess up other cards tho |
It does work. |
EDIT: I wrote in more detail on this issue; #655 With my rx 6600(gfx1032) I couldn't compile it as "lazy" no matter what I did. ggerganov#1087 (comment) I was able to compile it as "non-lazy merged library" as done here. I get very good results on windows, I couldn't see any difference between the speed I get when using linux. I downloaded the koboldcpp_rocm_files.zip file from https://github.com/YellowRoseCx/koboldcpp-rocm/releases/tag/v1.56.yr0-ROCm. I put the Kernels.so-000-gfx1032.hsaco and TensileLibrary.dat files in the rocblas/library folder(I also put it under "AMD\ROCm\5.5\bin\rocblas\library") I am attaching the files I compiled for gfx1032. EDIT: After some experimentation, I think that Linux probably produces faster results. I won't know until I do extensive testing, of course |
Adding them into KoboldCpp-ROCm 1.57.1.yr1, hopefully everything works as intended xD |
Yay! I'm happy that we are working around hacky ways to get around ROCm's weird limitations! |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Select Topic Area
Product Feedback
Body
gfx1031.zip
Hi! I've compiled the Gfx1031 tensile libraries for use with 6800M, which would also work with 6700XT as well, given that they are the same ISA.
This is based from the comment here: ggerganov#1087 (comment)
based on which I generated the (old) non-lazy merged library format. This alleviates the gibberish issue with using gfx1030. I also had initially copied gfx1030 to gfx1031 (ref here: ggerganov#1087 (comment)), but the resposes from llama.cpp wouldn't make any sense.
Current Behavior
As it stands copying gfx1030 to gfx1031 outputs gibberish at times, the attached libraries should allow non-gibberish, sensible inference.
Environment and Context
I have a laptop with 6800M (gfx1031) running windows 10
The text was updated successfully, but these errors were encountered: