6700XT/6800M Gfx1031 libraries for compilation of Kobold.cpp #441

harish0201 · 2023-09-17T05:41:37Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[x ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[ x] I carefully followed the README.md.
[ x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[ x] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Select Topic Area

Product Feedback

Body

gfx1031.zip

Hi! I've compiled the Gfx1031 tensile libraries for use with 6800M, which would also work with 6700XT as well, given that they are the same ISA.

This is based from the comment here: ggerganov#1087 (comment)
based on which I generated the (old) non-lazy merged library format. This alleviates the gibberish issue with using gfx1030. I also had initially copied gfx1030 to gfx1031 (ref here: ggerganov#1087 (comment)), but the resposes from llama.cpp wouldn't make any sense.

Current Behavior

As it stands copying gfx1030 to gfx1031 outputs gibberish at times, the attached libraries should allow non-gibberish, sensible inference.

Environment and Context

I have a laptop with 6800M (gfx1031) running windows 10

LostRuins · 2023-09-17T07:29:29Z

Just pinging @YellowRoseCx as they might find it useful to know.

kowierczyk · 2023-09-18T15:00:50Z

it also works when done the same for rx6600 with gfx1032, speaks gibberish too (probably easy to fix)

Drake-AI · 2023-09-18T16:23:13Z

Following this https://www.reddit.com/r/LocalLLaMA/comments/16d1hi0/comment/jzmvbfc/?context=3 from step 8 and then copy the libraries or use the make_pyinst_rocm_hybrid_henk_yellow.bat works for me with 6750XT. Without mmq it gives 6ms/token procesed and 40ms/token generated with all layers in the gpu for 13b model using YellowRose fork.

kowierczyk · 2023-09-18T17:34:44Z

Following this https://www.reddit.com/r/LocalLLaMA/comments/16d1hi0/comment/jzmvbfc/?context=3 from step 8 and then copy the libraries or use the make_pyinst_rocm_hybrid_henk_yellow.bat works for me with 6750XT. Without mmq it gives 6ms/token procesed and 40ms/token generated with all layers in the gpu for 13b model using YellowRose fork.

when running cmake --install build\release --prefix "C:\Program Files\AMD\ROCm\5.5" i get: CMake Error: Error processing file: C:/Windows/System32/rocBLAS/build/release/cmake_install.cmake

YellowRoseCx · 2023-09-18T17:42:34Z

Following this https://www.reddit.com/r/LocalLLaMA/comments/16d1hi0/comment/jzmvbfc/?context=3 from step 8 and then copy the libraries or use the make_pyinst_rocm_hybrid_henk_yellow.bat works for me with 6750XT. Without mmq it gives 6ms/token procesed and 40ms/token generated with all layers in the gpu for 13b model using YellowRose fork.

when running cmake --install build\release --prefix "C:\Program Files\AMD\ROCm\5.5" i get: CMake Error: Error processing file: C:/Windows/System32/rocBLAS/build/release/cmake_install.cmake

Are you CDing to the rocBlas folder before running that?

Drake-AI · 2023-09-18T17:43:04Z

Following this https://www.reddit.com/r/LocalLLaMA/comments/16d1hi0/comment/jzmvbfc/?context=3 from step 8 and then copy the libraries or use the make_pyinst_rocm_hybrid_henk_yellow.bat works for me with 6750XT. Without mmq it gives 6ms/token procesed and 40ms/token generated with all layers in the gpu for 13b model using YellowRose fork.

when running cmake --install build\release --prefix "C:\Program Files\AMD\ROCm\5.5" i get: CMake Error: Error processing file: C:/Windows/System32/rocBLAS/build/release/cmake_install.cmake

You need to run it in x64 native as admin INSIDE the rocBLAS directory

kowierczyk · 2023-09-18T17:45:28Z

Following this https://www.reddit.com/r/LocalLLaMA/comments/16d1hi0/comment/jzmvbfc/?context=3 from step 8 and then copy the libraries or use the make_pyinst_rocm_hybrid_henk_yellow.bat works for me with 6750XT. Without mmq it gives 6ms/token procesed and 40ms/token generated with all layers in the gpu for 13b model using YellowRose fork.

when running cmake --install build\release --prefix "C:\Program Files\AMD\ROCm\5.5" i get: CMake Error: Error processing file: C:/Windows/System32/rocBLAS/build/release/cmake_install.cmake

Are you CDing to the rocBlas folder before running that?

yes im in C:/Windows/System32/rocBLAS is that right?
or is it supposed to be some rocblas inside rocm 5.5 in program files?

Drake-AI · 2023-09-18T17:48:27Z

Following this https://www.reddit.com/r/LocalLLaMA/comments/16d1hi0/comment/jzmvbfc/?context=3 from step 8 and then copy the libraries or use the make_pyinst_rocm_hybrid_henk_yellow.bat works for me with 6750XT. Without mmq it gives 6ms/token procesed and 40ms/token generated with all layers in the gpu for 13b model using YellowRose fork.

when running cmake --install build\release --prefix "C:\Program Files\AMD\ROCm\5.5" i get: CMake Error: Error processing file: C:/Windows/System32/rocBLAS/build/release/cmake_install.cmake

Are you CDing to the rocBlas folder before running that?

yes im in C:/Windows/System32/rocBLAS is that right? or is it supposed to be some rocblas inside rocm 5.5 in program files?

git clone https://github.com/ROCmSoftwarePlatform/rocBLAS

cd rocBLAS

Inside that directory, where you clone the rocBLAS repo.

kowierczyk · 2023-09-18T17:56:02Z

yes thats exactly what i did
because default directory is system32 thats where my rocBLAS folder cloned

Drake-AI · 2023-09-18T17:59:23Z

yes thats exactly what i did because default directory is system32 thats where my rocBLAS folder cloned

But at first yo clone the repos just in plain cmd, everytime you open a x64 native tools command prompt you need to go to the original directory where you clone rocBLAS repo the first time. Don't do that in system32, don't do anything in system32.

kowierczyk · 2023-09-18T18:03:10Z

i dont know if i cant explain it or i am just dumb, but i cloned the repo in cmd, the repo cloned into system32 directory because its default, and everytime i open x64 native tools command prompt i cd into rocblas folder that i cloned

Drake-AI · 2023-09-18T18:08:18Z

i dont know if i cant explain it or i am just dumb, but i cloned the repo in cmd, the repo cloned into system32 directory because its default, and everytime i open x64 native tools command prompt i cd into rocblas folder that i cloned

Don't do it in system32, do it in c: or in other dir, never do anything in system32. Repeat all the steps cloning repos in c: or wherever you want, but never inside OS dirs.

kowierczyk · 2023-09-18T20:37:10Z

i dont know if i cant explain it or i am just dumb, but i cloned the repo in cmd, the repo cloned into system32 directory because its default, and everytime i open x64 native tools command prompt i cd into rocblas folder that i cloned

Don't do it in system32, do it in c: or in other dir, never do anything in system32. Repeat all the steps cloning repos in c: or wherever you want, but never inside OS dirs.

well now im getting:
[4/250] Running utility command for TENSILE_LIBRARY_TARGET
FAILED: library/src/CMakeFiles/TENSILE_LIBRARY_TARGET.util
library\src\CMakeFiles\TENSILE_LIBRARY_TARGET.dir\utility.bat 8b1baf77e970c57e
Error copying file (if different) from "D:\rocBLAS\build\release\Tensile\library\TensileLibrary_Type_CC_Contraction_l_Ailk_Bljk_Cijk_Dijk_fallback.dat" to "D:/rocBLAS/build/release/Tensile/library".
Batch file failed at line 3 with errorcode 1
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "D:\rocBLAS\rmake.py", line 445, in
main()
File "D:\rocBLAS\rmake.py", line 438, in main
if run_cmd(exe, opts):
^^^^^^^^^^^^^^^^^^
File "D:\rocBLAS\rmake.py", line 406, in run_cmd
proc = subprocess.run(program, check=True, stderr=subprocess.STDOUT, shell=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'ninja.exe -j 12 all' returned non-zero exit status 1.

Drake-AI · 2023-09-19T11:43:27Z

Check all the steps from here https://www.reddit.com/r/LocalLLaMA/comments/16d1hi0/guide_build_llamacpp_on_windows_with_amd_gpus_and/ and set the system enviroment variables.

YellowRoseCx · 2023-09-19T12:02:15Z

i dont know if i cant explain it or i am just dumb, but i cloned the repo in cmd, the repo cloned into system32 directory because its default, and everytime i open x64 native tools command prompt i cd into rocblas folder that i cloned

Don't do it in system32, do it in c: or in other dir, never do anything in system32. Repeat all the steps cloning repos in c: or wherever you want, but never inside OS dirs.

well now im getting:
[4/250] Running utility command for TENSILE_LIBRARY_TARGET
FAILED: library/src/CMakeFiles/TENSILE_LIBRARY_TARGET.util
library\src\CMakeFiles\TENSILE_LIBRARY_TARGET.dir\utility.bat 8b1baf77e970c57e
Error copying file (if different) from "D:\rocBLAS\build\release\Tensile\library\TensileLibrary_Type_CC_Contraction_l_Ailk_Bljk_Cijk_Dijk_fallback.dat" to "D:/rocBLAS/build/release/Tensile/library".
Batch file failed at line 3 with errorcode 1
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "D:\rocBLAS\rmake.py", line 445, in
main()
File "D:\rocBLAS\rmake.py", line 438, in main
if run_cmd(exe, opts):
^^^^^^^^^^^^^^^^^^
File "D:\rocBLAS\rmake.py", line 406, in run_cmd
proc = subprocess.run(program, check=True, stderr=subprocess.STDOUT, shell=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'ninja.exe -j 12 all' returned non-zero exit status 1.

I've been trying all day, I think there's some GPUs that just don't have proper windows support yet. I've got the gfx1031 to compile when that's the only GPU arch, but as soon as I compile it with others like the gfx1030, it builds successfully But my outputs become garbled nonsense and are wrong. I even tried building them separately then adding the files together to the same folder and it didn't work

Drake-AI · 2023-09-19T12:17:15Z

i dont know if i cant explain it or i am just dumb, but i cloned the repo in cmd, the repo cloned into system32 directory because its default, and everytime i open x64 native tools command prompt i cd into rocblas folder that i cloned

Don't do it in system32, do it in c: or in other dir, never do anything in system32. Repeat all the steps cloning repos in c: or wherever you want, but never inside OS dirs.

well now im getting:
[4/250] Running utility command for TENSILE_LIBRARY_TARGET
FAILED: library/src/CMakeFiles/TENSILE_LIBRARY_TARGET.util
library\src\CMakeFiles\TENSILE_LIBRARY_TARGET.dir\utility.bat 8b1baf77e970c57e
Error copying file (if different) from "D:\rocBLAS\build\release\Tensile\library\TensileLibrary_Type_CC_Contraction_l_Ailk_Bljk_Cijk_Dijk_fallback.dat" to "D:/rocBLAS/build/release/Tensile/library".
Batch file failed at line 3 with errorcode 1
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "D:\rocBLAS\rmake.py", line 445, in
main()
File "D:\rocBLAS\rmake.py", line 438, in main
if run_cmd(exe, opts):
^^^^^^^^^^^^^^^^^^
File "D:\rocBLAS\rmake.py", line 406, in run_cmd
proc = subprocess.run(program, check=True, stderr=subprocess.STDOUT, shell=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'ninja.exe -j 12 all' returned non-zero exit status 1.

I've been trying all day, I think there's some GPUs that just don't have proper windows support yet. I've got the gfx1031 to compile when that's the only GPU arch, but as soon as I compile it with others like the gfx1030, it builds successfully But my outputs become garbled nonsense and are wrong. I even tried building them separately then adding the files together to the same folder and it didn't work

Mine is gfx1031, 6750xt and is working like a charm, compiled myself your repo and then compile rocBLAS and Tensile with the patch, but rocBLAS and Tensile just for gfx1031. I think you don't need to compile rocBLAS and Tensile for gpus that are supported like 1030, just for unsupported gpus.

YellowRoseCx · 2023-09-19T12:31:02Z

i dont know if i cant explain it or i am just dumb, but i cloned the repo in cmd, the repo cloned into system32 directory because its default, and everytime i open x64 native tools command prompt i cd into rocblas folder that i cloned

Don't do it in system32, do it in c: or in other dir, never do anything in system32. Repeat all the steps cloning repos in c: or wherever you want, but never inside OS dirs.

well now im getting:
[4/250] Running utility command for TENSILE_LIBRARY_TARGET
FAILED: library/src/CMakeFiles/TENSILE_LIBRARY_TARGET.util
library\src\CMakeFiles\TENSILE_LIBRARY_TARGET.dir\utility.bat 8b1baf77e970c57e
Error copying file (if different) from "D:\rocBLAS\build\release\Tensile\library\TensileLibrary_Type_CC_Contraction_l_Ailk_Bljk_Cijk_Dijk_fallback.dat" to "D:/rocBLAS/build/release/Tensile/library".
Batch file failed at line 3 with errorcode 1
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "D:\rocBLAS\rmake.py", line 445, in
main()
File "D:\rocBLAS\rmake.py", line 438, in main
if run_cmd(exe, opts):
^^^^^^^^^^^^^^^^^^
File "D:\rocBLAS\rmake.py", line 406, in run_cmd
proc = subprocess.run(program, check=True, stderr=subprocess.STDOUT, shell=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'ninja.exe -j 12 all' returned non-zero exit status 1.

I've been trying all day, I think there's some GPUs that just don't have proper windows support yet. I've got the gfx1031 to compile when that's the only GPU arch, but as soon as I compile it with others like the gfx1030, it builds successfully But my outputs become garbled nonsense and are wrong. I even tried building them separately then adding the files together to the same folder and it didn't work

Mine is gfx1031, 6750xt and is working like a charm, compiled myself your repo and then compile rocBLAS and Tensile with the patch, but rocBLAS and Tensile just for gfx1031. I think you don't need to compile rocBLAS and Tensile for gpus that are supported like 1030, just for unsupported gpus.

So when you built the library for gfx1031, you dragged those files into the rocblas folder inside the koboldcpp-rocm folder? When it asked you if you wanted to replace files, did you? Because when I replace those files is when it breaks support for me. Files like TensileLibrary_Type_CC_Contraction_l_AlikC_BjlkC_Cijk_Dijk_fallback.dat

Drake-AI · 2023-09-19T12:36:43Z

i dont know if i cant explain it or i am just dumb, but i cloned the repo in cmd, the repo cloned into system32 directory because its default, and everytime i open x64 native tools command prompt i cd into rocblas folder that i cloned

Don't do it in system32, do it in c: or in other dir, never do anything in system32. Repeat all the steps cloning repos in c: or wherever you want, but never inside OS dirs.

well now im getting:
[4/250] Running utility command for TENSILE_LIBRARY_TARGET
FAILED: library/src/CMakeFiles/TENSILE_LIBRARY_TARGET.util
library\src\CMakeFiles\TENSILE_LIBRARY_TARGET.dir\utility.bat 8b1baf77e970c57e
Error copying file (if different) from "D:\rocBLAS\build\release\Tensile\library\TensileLibrary_Type_CC_Contraction_l_Ailk_Bljk_Cijk_Dijk_fallback.dat" to "D:/rocBLAS/build/release/Tensile/library".
Batch file failed at line 3 with errorcode 1
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "D:\rocBLAS\rmake.py", line 445, in
main()
File "D:\rocBLAS\rmake.py", line 438, in main
if run_cmd(exe, opts):
^^^^^^^^^^^^^^^^^^
File "D:\rocBLAS\rmake.py", line 406, in run_cmd
proc = subprocess.run(program, check=True, stderr=subprocess.STDOUT, shell=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'ninja.exe -j 12 all' returned non-zero exit status 1.

I've been trying all day, I think there's some GPUs that just don't have proper windows support yet. I've got the gfx1031 to compile when that's the only GPU arch, but as soon as I compile it with others like the gfx1030, it builds successfully But my outputs become garbled nonsense and are wrong. I even tried building them separately then adding the files together to the same folder and it didn't work

Mine is gfx1031, 6750xt and is working like a charm, compiled myself your repo and then compile rocBLAS and Tensile with the patch, but rocBLAS and Tensile just for gfx1031. I think you don't need to compile rocBLAS and Tensile for gpus that are supported like 1030, just for unsupported gpus.

So when you built the library for gfx1031, you dragged those files into the rocblas folder inside the koboldcpp-rocm folder? When it asked you if you wanted to replace files, did you? Because when I replace those files is when it breaks support for me. Files like TensileLibrary_Type_CC_Contraction_l_AlikC_BjlkC_Cijk_Dijk_fallback.dat

Yes, all files in C:\Program Files\AMD\ROCm\5.5\bin\rocblas\library inside your repo. Just check you don't have lazy and non-lazy together. Here ggerganov#1087 (comment) explain how to create non lazy, if you use non-lazy you need to remove TensileLibrary_lazy_gfx1031.dat. If you want i can pack the files compiled and send it to you.

Update: I tried to compile for gfx1032 and I can't, it looks like you can't compile for a gpu you don't have.

harish0201 · 2023-09-19T15:52:17Z

Hi,

Sorry I was being a bit sick in the past few days.

Here are the lazy and non-lazy versions of the libraries (might've gotten the names swapped) @YellowRoseCx
lazy_gfx1031.zip
non_lazy_gfx1031.zip

@Drake-AI It sucks that you aren't able to build it for something that you don't have! But like you said, for compilation, you need to have either one in the folder, not both.

Maybe we can crowd-source the Tensile libraries in a repo? That way people can pull them down whilst they are compiling or even use with other applications, given that ROCm can run on windows?

Foxlum · 2023-09-19T17:14:47Z

I have a RX6600 and some ability to try and compile those tensile libs, after previously trying to get the WIP MIOpen Windows Port compiled. (Which was absolutely a struggle to even get partway there)

YellowRoseCx · 2023-09-19T22:17:03Z

Hi,

Sorry I was being a bit sick in the past few days.

Here are the lazy and non-lazy versions of the libraries (might've gotten the names swapped) @YellowRoseCx lazy_gfx1031.zip non_lazy_gfx1031.zip

@Drake-AI It sucks that you aren't able to build it for something that you don't have! But like you said, for compilation, you need to have either one in the folder, not both.

Maybe we can crowd-source the Tensile libraries in a repo? That way people can pull them down whilst they are compiling or even use with other applications, given that ROCm can run on windows?

how did you make that non-lazy version? When I followed that guy's instructions, it enables lazy-library loading and I always got "lazy" files

These are what my results were:

gfx1032;gfx1033;gfx1034;gfx1035 = Compile Errors

gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1011;gfx1012;gfx1030;gfx1031;gfx1100;gfx1101;gfx1102 = Built but not working altogether

for some reason I've only been able to generate lazy files, even with modifying rmake.py to disable lazy-library-loading

and when I make multiple kernels at the same time, like:
>python rmake.py -a gfx1010;gfx1030 --no-lazy-library-loading --no-merge-architectures -t C:\Users\YellowRose\rocmbuild\Tensile
I get bad output like: 1 geprüft everybody everybody nobodyς everybody via everybody knows); surely⊕rrrsquitechunscientific article everybody getsislandingfordays everyoneverybodyettapeople, andr they getrustleaving the websiclose girls or aor the first time?

harish0201 · 2023-09-20T02:24:19Z

You need to cd into the rocblas folder where you're rdeps and rmake files are, in a x64 Native Tools command prompt as admin and then do: .\build\release\virtualenv\Scripts\activate.bat TensileCreateLibrary --architecture YOUR_GPU_ARCHS --code-object-version default --merge-files --library-format msgpack .\library\src\blas3\Tensile\Logic\asm_full C:\SomeOutputFolder HIP You're going to change the GPU arch and SomeOutputFolder per your need.

…

On Tue, Sep 19, 2023, 6:17 PM YellowRoseCx ***@***.***> wrote: Hi, Sorry I was being a bit sick in the past few days. Here are the lazy and non-lazy versions of the libraries (might've gotten the names swapped) @YellowRoseCx <https://github.com/YellowRoseCx> lazy_gfx1031.zip <https://github.com/LostRuins/koboldcpp/files/12663973/lazy_gfx1031.zip> non_lazy_gfx1031.zip <https://github.com/LostRuins/koboldcpp/files/12663974/non_lazy_gfx1031.zip> @Drake-AI <https://github.com/Drake-AI> It sucks that you aren't able to build it for something that you don't have! But like you said, for compilation, you need to have either one in the folder, not both. Maybe we can crowd-source the Tensile libraries in a repo? That way people can pull them down whilst they are compiling or even use with other applications, given that ROCm can run on windows? how did you make that non-lazy version? When I followed that guy's instructions, it enables lazy-library loading and I always got "lazy" files These are what my results were: gfx1032;gfx1033;gfx1034;gfx1035 = Compile Errors gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1011;gfx1012;gfx1030;gfx1031;gfx1100;gfx1101;gfx1102 = Built but not working altogether for some reason I've only been able to generate lazy files, even with modifying rmake.py to disable lazy-library-loading — Reply to this email directly, view it on GitHub <#441 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACFMFNYJZUHAU7S53OBEZO3X3IKWVANCNFSM6AAAAAA43MJYQM> . You are receiving this because you authored the thread.Message ID: ***@***.***>

YellowRoseCx · 2023-09-20T03:32:48Z

You need to cd into the rocblas folder where you're rdeps and rmake files
are, in a x64 Native Tools command prompt as admin and then do:

.\build\release\virtualenv\Scripts\activate.bat
TensileCreateLibrary --architecture YOUR_GPU_ARCHS --code-object-version
default --merge-files --library-format msgpack
.\library\src\blas3\Tensile\Logic\asm_full C:\SomeOutputFolder HIP

You're going to change the GPU arch and SomeOutputFolder per your need.

On Tue, Sep 19, 2023, 6:17 PM YellowRoseCx @.***> wrote:

Hi,

Sorry I was being a bit sick in the past few days.

Here are the lazy and non-lazy versions of the libraries (might've gotten
the names swapped) @YellowRoseCx https://github.com/YellowRoseCx
lazy_gfx1031.zip
https://github.com/LostRuins/koboldcpp/files/12663973/lazy_gfx1031.zip
non_lazy_gfx1031.zip
https://github.com/LostRuins/koboldcpp/files/12663974/non_lazy_gfx1031.zip

@Drake-AI https://github.com/Drake-AI It sucks that you aren't able to
build it for something that you don't have! But like you said, for
compilation, you need to have either one in the folder, not both.

Maybe we can crowd-source the Tensile libraries in a repo? That way people
can pull them down whilst they are compiling or even use with other
applications, given that ROCm can run on windows?

how did you make that non-lazy version? When I followed that guy's
instructions, it enables lazy-library loading and I always got "lazy" files

These are what my results were:

gfx1032;gfx1033;gfx1034;gfx1035 = Compile Errors

gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1011;gfx1012;gfx1030;gfx1031;gfx1100;gfx1101;gfx1102
= Built but not working altogether

for some reason I've only been able to generate lazy files, even with
modifying rmake.py to disable lazy-library-loading

—
Reply to this email directly, view it on GitHub
#441 (comment),
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACFMFNYJZUHAU7S53OBEZO3X3IKWVANCNFSM6AAAAAA43MJYQM
.
You are receiving this because you authored the thread.Message ID:
@.***>

I did that and have tried it with lazy and non lazy files. As soon as I try to build kernels for more than 1 GPU, even if it's something like GFX906 and GFX1030 that come with Windows ROCm, my output becomes garbled on a 6800xt

If one of you can try building for multiple kernels you might see what I mean

I have a GFX1030, GFX900, and GFX1010 I can test with

ghost · 2023-09-23T11:54:03Z

If I download the current build here and compile it with make LLAMA_HIPBLAS=1 -j4, then I also get garbled output on my RX6650 XT, but with Linux. Sometimes the EOS triggers right at the beginning and it doesn't output anything at all, sometimes it fills the range of the specified max tokens and spams me with one word. Something seems to be broken since the last builds of llama.cpp, at least for me. Strangely enough, when I use your rocm build it still works, older llama versions also work, which I find strange. So this doesn't just seem to affect Windows. Never had any problems with it under Linux before. Also with ROCm 5.7.0, exactly the same issue.

YellowRoseCx · 2023-09-24T10:28:04Z

If I download the current build here and compile it with make LLAMA_HIPBLAS=1 -j4, then I also get garbled output on my RX6650 XT, but with Linux. Sometimes the EOS triggers right at the beginning and it doesn't output anything at all, sometimes it fills the range of the specified max tokens and spams me with one word. Something seems to be broken since the last builds of llama.cpp, at least for me. Strangely enough, when I use your rocm build it still works, older llama versions also work, which I find strange. So this doesn't just seem to affect Windows. Never had any problems with it under Linux before. Also with ROCm 5.7.0, exactly the same issue.

try using 2 or 3 less layers than the maximum the model has
Fæth in discord found out that using 33/35 layers for 7b works, and 41/43 layers for 13b. There's apparently some issue with the last 2 extra layers that get added

harish0201 · 2023-09-24T18:54:09Z

I did that and have tried it with lazy and non lazy files. As soon as I try to build kernels for more than 1 GPU, even if it's something like GFX906 and GFX1030 that come with Windows ROCm, my output becomes garbled on a 6800xt

If one of you can try building for multiple kernels you might see what I mean

I have a GFX1030, GFX900, and GFX1010 I can test with

I did with my laptop's integrated graphics a while ago (it was gfx90c), but ROCm doesn't support APUs and I wasn't able to compile for others.

mahdiyari · 2024-01-30T23:17:06Z

Works great on 6700 xt on windows 10!

This is how I did it:

Get koboldcpp_rocm_files.zip
pip install customtkinter
Copy TensileLibrary.dat and Kernels.so-000-gfx1031.hsaco into rocblas\library (files from the original post)
python .\koboldcpp.py

I didn't have to replace any files in the rocblas\library folder. The files added were missing.

@YellowRoseCx Would it be possible to have the .exe build with these files added?

Thanks all 🙌

YellowRoseCx · 2024-01-31T01:45:03Z

Works great on 6700 xt on windows 10!

This is how I did it:

Get koboldcpp_rocm_files.zip

pip install customtkinter

Copy TensileLibrary.dat and Kernels.so-000-gfx1031.hsaco into rocblas\library (files from the original post)

python .\koboldcpp.py

I didn't have to replace any files in the rocblas\library folder. The files added were missing.

@YellowRoseCx Would it be possible to have the .exe build with these files added?

Thanks all 🙌

see if this works for you: https://github.com/YellowRoseCx/koboldcpp-rocm/releases/tag/v1.56.yr1-ROCm

I'm gonna have to test it myself to make sure adding those didnt mess up other cards tho

mahdiyari · 2024-01-31T09:13:39Z

see if this works for you: https://github.com/YellowRoseCx/koboldcpp-rocm/releases/tag/v1.56.yr1-ROCm

I'm gonna have to test it myself to make sure adding those didnt mess up other cards tho

It does work.

jasyuiop · 2024-02-01T10:03:29Z

EDIT: I wrote in more detail on this issue; #655

With my rx 6600(gfx1032) I couldn't compile it as "lazy" no matter what I did. ggerganov#1087 (comment) I was able to compile it as "non-lazy merged library" as done here. I get very good results on windows, I couldn't see any difference between the speed I get when using linux.

I downloaded the koboldcpp_rocm_files.zip file from https://github.com/YellowRoseCx/koboldcpp-rocm/releases/tag/v1.56.yr0-ROCm. I put the Kernels.so-000-gfx1032.hsaco and TensileLibrary.dat files in the rocblas/library folder(I also put it under "AMD\ROCm\5.5\bin\rocblas\library")

I am attaching the files I compiled for gfx1032.
gfx1032_none_lazy.zip

EDIT: After some experimentation, I think that Linux probably produces faster results. I won't know until I do extensive testing, of course

YellowRoseCx · 2024-02-10T19:34:52Z

Adding them into KoboldCpp-ROCm 1.57.1.yr1, hopefully everything works as intended xD
Thanks!

harish0201 · 2024-02-22T20:42:17Z

Yay! I'm happy that we are working around hacky ways to get around ROCm's weird limitations!

jasyuiop mentioned this issue Feb 1, 2024

6600/6600 XT/6650 XT gfx1032 libraries for compilation of Kobold.cpp #655

Open

hiepxanh mentioned this issue Feb 28, 2024

[Bug]: rocBLAS error: Cannot read TensileLibrary.dat: No such file or directory ROCm/Tensile#1936

Closed

jeromew mentioned this issue Jun 14, 2024

AMD - tinyBLAS windows prebuilt support stopped working with 0.8.5 Mozilla-Ocho/llamafile#441

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

6700XT/6800M Gfx1031 libraries for compilation of Kobold.cpp #441

6700XT/6800M Gfx1031 libraries for compilation of Kobold.cpp #441

harish0201 commented Sep 17, 2023

LostRuins commented Sep 17, 2023

kowierczyk commented Sep 18, 2023

Drake-AI commented Sep 18, 2023

kowierczyk commented Sep 18, 2023

YellowRoseCx commented Sep 18, 2023

Drake-AI commented Sep 18, 2023

kowierczyk commented Sep 18, 2023

Drake-AI commented Sep 18, 2023

kowierczyk commented Sep 18, 2023

Drake-AI commented Sep 18, 2023

kowierczyk commented Sep 18, 2023

Drake-AI commented Sep 18, 2023

kowierczyk commented Sep 18, 2023

Drake-AI commented Sep 19, 2023

YellowRoseCx commented Sep 19, 2023

Drake-AI commented Sep 19, 2023

YellowRoseCx commented Sep 19, 2023 •

edited

Loading

Drake-AI commented Sep 19, 2023 •

edited

Loading

harish0201 commented Sep 19, 2023

Foxlum commented Sep 19, 2023

YellowRoseCx commented Sep 19, 2023 •

edited

Loading

harish0201 commented Sep 20, 2023 via email

YellowRoseCx commented Sep 20, 2023

ghost commented Sep 23, 2023 •

edited by ghost

Loading

YellowRoseCx commented Sep 24, 2023

harish0201 commented Sep 24, 2023

mahdiyari commented Jan 30, 2024

YellowRoseCx commented Jan 31, 2024

mahdiyari commented Jan 31, 2024

jasyuiop commented Feb 1, 2024 •

edited

Loading

YellowRoseCx commented Feb 10, 2024

harish0201 commented Feb 22, 2024

6700XT/6800M Gfx1031 libraries for compilation of Kobold.cpp #441

6700XT/6800M Gfx1031 libraries for compilation of Kobold.cpp #441

Comments

harish0201 commented Sep 17, 2023

Prerequisites

Expected Behavior

Select Topic Area

Body

Current Behavior

Environment and Context

LostRuins commented Sep 17, 2023

kowierczyk commented Sep 18, 2023

Drake-AI commented Sep 18, 2023

kowierczyk commented Sep 18, 2023

YellowRoseCx commented Sep 18, 2023

Drake-AI commented Sep 18, 2023

kowierczyk commented Sep 18, 2023

Drake-AI commented Sep 18, 2023

kowierczyk commented Sep 18, 2023

Drake-AI commented Sep 18, 2023

kowierczyk commented Sep 18, 2023

Drake-AI commented Sep 18, 2023

kowierczyk commented Sep 18, 2023

Drake-AI commented Sep 19, 2023

YellowRoseCx commented Sep 19, 2023

Drake-AI commented Sep 19, 2023

YellowRoseCx commented Sep 19, 2023 • edited Loading

Drake-AI commented Sep 19, 2023 • edited Loading

harish0201 commented Sep 19, 2023

Foxlum commented Sep 19, 2023

YellowRoseCx commented Sep 19, 2023 • edited Loading

harish0201 commented Sep 20, 2023 via email

YellowRoseCx commented Sep 20, 2023

ghost commented Sep 23, 2023 • edited by ghost Loading

YellowRoseCx commented Sep 24, 2023

harish0201 commented Sep 24, 2023

mahdiyari commented Jan 30, 2024

YellowRoseCx commented Jan 31, 2024

mahdiyari commented Jan 31, 2024

jasyuiop commented Feb 1, 2024 • edited Loading

YellowRoseCx commented Feb 10, 2024

harish0201 commented Feb 22, 2024

YellowRoseCx commented Sep 19, 2023 •

edited

Loading

Drake-AI commented Sep 19, 2023 •

edited

Loading

YellowRoseCx commented Sep 19, 2023 •

edited

Loading

ghost commented Sep 23, 2023 •

edited by ghost

Loading

jasyuiop commented Feb 1, 2024 •

edited

Loading