Releases: YellowRoseCx/koboldcpp-rocm
KoboldCPP-v1.52.2.yr0-ROCm
https://github.com/LostRuins/koboldcpp/releases/tag/v1.52.2
- NEW: Added a new bare-bones KoboldCpp NoScript WebUI, which does not require Javascript to work. It should be W3C HTML compliant and should run on every browser in the last 20 years, even text-based ones like Lynx (e.g. in the terminal over SSH). It is accessible by default at /noscript e.g. http://localhost:5001/noscript . This can be helpful when running KoboldCpp from systems which do not support a modern browser with Javascript.
- Partial per-layer KV offloading is now merged for CUDA. Important: this means that the number of layers you can offload to GPU might be reduced, as each layer now takes up more space. To avoid per-layer KV offloading, use the --usecublas lowvram option (equivalent to -nkvo in llama.cpp). Fully offloaded models should behave the same as before.
- The /api/extra/tokencount endpoint now also returns an array of token ids in the response body from the tokenizer.
- Merged support for QWEN and Mixtral from upstream. Note: Mixtral seems to perform large batch prompt processing extremely slowly. This is probably an implementation issue. For now, you might have better luck using --noblas or setting --blasbatchsize -1 when using Mixtral
- Selecting a .kcpps in the GUI when choosing a model will load the model specified inside that config file instead.
- Added the Mamba Multitool script (from @henk717). This is a shell script that can be used in Linux to setup an environment with all dependencies required for building and running KoboldCpp on Linux.
- Improved KCPP Embedded Horde Worker fault tolerance, should now gracefully backoff for increasing durations whenever encountering errors polling from AI Horde, and will automatically recover from up to 24 hours of Horde downtime.
- Added a new parameter that shows number of Horde Worker errors in the /api/extra/perf endpoint, this can be used to monitor your embedded horde worker if it goes down.
- Pulled other fixes and improvements from upstream, updated Kobold Lite, added asynchronous file autosaves (thanks @aleksusklim), various other improvements.
Hotfix 1.52.1: Fixed 'not enough memory' loading errors for large (20B+) models. See #563
NEW: Added Linux PyInstaller binaries
Hotfix 1.52.2: Merged fixes for Mixtral prompt processingTo use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork hereRun it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001/
For more information, be sure to run the program from command line with the --help flag.
Windows KoboldCPP-ROCm v1.43 .exe
Windows Compiled KoboldCPP with ROCm support!
I want to thank @LostRuins for making KoboldCPP and general guidance, @henk717 for all his dedication to KoboldAI that brought us here in the first place, and to @SlyEcho who originally started the ROCm Port for llama.cpp
You need ROCm to build it, but not to run it: https://rocm.docs.amd.com/en/latest/deploy/windows/quick_start.html
Compiled for the GPU's that have Tensile Libraries/ marked as supported: gfx906, gfx1030, gfx1100, gfx1101, gfx1102
To run, open it; or start via command-line
Example:
./koboldcpp_rocm.exe --usecublas normal mmq --threads 1 --stream --contextsize 4096 --usemirostat 2 6 0.1 --gpulayers 45 C:\Users\YellowRose\llama-2-7b-chat.Q8_0.gguf
This site may be useful, it has some patches for Windows ROCm to help it with compilation that I used, but I'm not sure if it's necessary. https://streamhpc.com/blog/2023-08-01/how-to-get-full-cmake-support-for-amd-hip-sdk-on-windows-including-patches/
Build command used (ROCm Required):
cd koboldcpp-rocm
mkdir build && cd build
cmake .. -G "Ninja" -DCMAKE_BUILD_TYPE=Release -DLLAMA_HIPBLAS=ON -DCMAKE_C_COMPILER="C:/Program Files/AMD/ROCm/5.5/bin/clang.exe" -DCMAKE_CXX_COMPILER="C:/Program Files/AMD/ROCm/5.5/bin/clang++.exe" -DAMDGPU_TARGETS="gfx906;gfx1030;gfx1100;gfx1101;gfx1102"
cmake --build . -j 6
That puts koboldcpp_cublas.dll inside of .\koboldcpp-rocm\build\bin
copy koboldcpp_cublas.dll to the main koboldcpp-rocm folder
(You can run koboldcpp.py like this right away)
To make it into an exe, we use make_pyinst_rocm_hybrid_henk_yellow.bat
But that file's set up to add CLBlast and OpenBlas too, you can either remove those lines so it's just this code:
cd /d "%~dp0"
copy "C:\Program Files\AMD\ROCm\5.5\bin\hipblas.dll" .\ /Y
copy "C:\Program Files\AMD\ROCm\5.5\bin\rocblas.dll" .\ /Y
xcopy /E /I "C:\Program Files\AMD\ROCm\5.5\bin\rocblas" .\rocblas\
PyInstaller --noconfirm --onefile --collect-all customtkinter --clean --console --icon ".\niko.ico" --add-data "./klite.embd;." --add-data "./koboldcpp_cublas.dll;." --add-data "./hipblas.dll;." --add-data "./rocblas.dll;." --add-data "./rwkv_vocab.embd;." --add-data "./rocblas;." --add-data "C:/Windows/System32/msvcp140.dll;." --add-data "C:/Windows/System32/vcruntime140_1.dll;." "./koboldcpp.py" -n "koboldcppRocm.exe"
or you can download w64devkit and cd into the folder and run make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1 -j4
then it will build the rest of the backend files
Once they're all built, you should be able to just run make_pyinst_rocm_hybrid_henk_yellow.bat
as it is and it'll bundle the files together into koboldcppRocm.exe in the \koboldcpp-rocm\dists folder
KoboldCPP-v1.52.1.yr0-ROCm
Merge remote-tracking branch 'upstream/concedo'
KoboldCPP-v1.52.yr0-ROCm
Various new features including new model Mixtral support
Mixtral-Kcpp-v1.52.RC1.yr1-ROCm FanService Ed.
Unofficial release candidate build containing experimental features and Mixtral Model support
KoboldCPP-v1.51.1.yr1-ROCm
Now includes the full build featuring hipBLAS (ROCm), CLBlast, OpenBLAS, No BLAS, and 2 Backup backends
KoboldCPP-v1.50.1.yr1-ROCm
Merge remote-tracking branch 'upstream/concedo'
KoboldCPP-v1.50.yr0-ROCm
Merge remote-tracking branch 'upstream/concedo'
KoboldCPP-1.48.1.yr2-ROCm
Merge remote-tracking branch 'upstream/concedo'
KoboldCPP-v1.47.2.yr0-ROCm
Merge remote-tracking branch 'upstream/concedo'