ROCm access violation #46

introvertedcreature · 2024-05-02T07:11:58Z

introvertedcreature
May 2, 2024

Llama 3 models throw access violation errors after a few hundred tokens are reached. Running on 16GB Radeon 6800, so vram shouldn't be the problem. When switching to Vulkan, this does not happen. Maybe I'm missing something in the settings?

@YellowRoseCx answer below:
"Ive been looking into it for weeks and the only thing I can think of is that llama.cpp changed some code thats now incompatible with Rocm 5.7... the biggest change I know of that happened between the last working version and the one that first broke is Llama.cpp changed the ggml-cuda.cu file and broke it up into like 30 different files and did who knows what else to it

I'm really sorry I haven't been able to figure it out and fix it, yet.

If anyone is able try and help narrow it down, it would be extremely appreciated"

thedrbradley · 2024-05-02T08:55:13Z

thedrbradley
May 2, 2024

I found the same with any model i tried. I went back to 1.61.2 and function returned to normal.

1 reply

introvertedcreature May 2, 2024
Author

Indeed, just noticed the same with other, non llama 3 models. I had a hunch it's not llama 3's fault, because llama 3 runs fine on LM Studio's ROCm preview.

Just tested, 1.63 seems to work fine for me.

Tureti · 2024-05-02T12:23:58Z

Tureti
May 2, 2024

Yeah newest version is unusable for me right now.

0 replies

nnyanpasu · 2024-05-02T19:46:02Z

nnyanpasu
May 2, 2024

same on 7900xtx
Processing Prompt [BLAS] (1536 / 21955 tokens)exception: access violation reading 0x00000005B8A01000

0 replies

waynekwal · 2024-05-04T10:32:35Z

waynekwal
May 4, 2024

Same with me, access violation. Direct in the Kobold UI or through ST, if that matters at all.

0 replies

Pablo4096 · 2024-05-04T21:55:49Z

Pablo4096
May 4, 2024

"exception: access violation reading 0x00000007F115C100"
MMQ OFF / microstat v2 ON / change contex size, temperature.
RX6600 / model - LLaMA2-13B-Tiefighter.Q5_K_M

0 replies

Tureti · 2024-05-08T23:27:52Z

Tureti
May 8, 2024

Update 1.64.1, still same error.

0 replies

dya3506 · 2024-05-10T13:12:04Z

dya3506
May 10, 2024

Same question, v1.64.1 not works on my 6600xt.

6 replies

Xonzo May 14, 2024

Not sure I understand, doesn't the upstream Koboldcpp lack ROCm support? Anyways this is borked for me as well. Access violations with v1.64.1. God damn using an AMD card is absolutely infuriating for AI.

introvertedcreature May 14, 2024
Author

The good news is that ROCm is finally being implemented more widely. For LLMs, I'm running Ollama with ROCm and it works great. There's also LM Studio's ROCm preview, which also works well. Not sure what happened to this fork, but now at least we have more options.

Xonzo May 14, 2024

Right you are. I am using the preview build of LM Studio with ROCm support, and that works well. I was just trying Koboldcpp again as that is given a lot of recommendation for use with SillyTavern (Probably doesn't make much of a difference now). I guess my frustration stems from trying to get pyTorch etc working with ROCm.... Whereas on my server with an Nvidia graphics card took all of 30 seconds.

introvertedcreature May 14, 2024
Author

For SillyTavern, I've found that Ollama works just as well as Koboldcpp, both of them have their advantages and disadvantages. One advantage of KCpp is that it can easily load any GGUF model. Ollama has its own model list, which is very easy to use, but if your model isn't on their list, it requires a bit of tinkering to load a "custom" model. Other than that, SillyTavern has full support for it built-in.

AryanEmbered May 21, 2024

kobold uses a different implementation of rocm. which makes gpus like rx6600 which are not officially supported by amd, usable with rocm. ollama and lmstudio use the official implementation.

Xonzo · 2024-05-14T13:30:19Z

Xonzo
May 14, 2024

V1.64.1 - Same issue, access violation on BLAS prompt. Tried various configurations with no success.

1 reply

HycasOrange May 15, 2024

Still an issue here too.
Starts reading tokens then - Access Violation 0x0000000628C91000

Definitely not restricted to Llama. As others said, rolling back to 1.61.2 works fine.

Tureti · 2024-05-16T19:24:55Z

Tureti
May 16, 2024

Update 1.65, still same error. @YellowRoseCx Any idea why this is happening?

11 replies

JonaKms May 17, 2024

I run a Ryzen 5600X and an RX 6700 XT. When using v1.65 I get exception: access violation reading 0x0000000628C91000. The latest working version for me is v1.63.yr0.

cloak1505 May 20, 2024

Just letting you guys know 1.65 [edit: might work] if you check "Use FlashAttention" from the Tokens tab. It may help to note this in Releases and something like "it's still screwy, the last known good version is 1.61.2".

Performance is slightly slower compared to 1.61.2 though, at least with RX 6600.

nnyanpasu May 20, 2024

nah, check "Use FlashAttention" didnt help for me with 7900xtx + 4x8b llama3

upd: but its works with normal 8b llama3 and maybe others with only 1 expert
imggen still not working

upd2: 70b llama3 works to but yeah, insanly slow
and Commander-r works now

anybody know command line argument - "Use FlashAttention"? I just like to use bat files for fast start with arguments

Pablo4096 May 20, 2024

Thx, switching on "Use FlashAttentio" restored the correct operation of 1.65 on my RX 6600, I was even able to enable "MMQ".
But 1.65 is noticeably slower than 1.63

cloak1505 May 20, 2024

anybody know command line argument - "Use FlashAttention"? I just like to use bat files for fast start with arguments

koboldcpp.exe --config config.kcpps or koboldcpp.exe [all the flags] --flashattention.
Run -h (help) in terminal to view arguments.

Meus-Artis · 2024-05-27T09:07:08Z

Meus-Artis
May 27, 2024

Don't see anyone saying anything about the new version, so...Using v1.66.1.yr1-ROCm with a RX 7800 XT on Adrenalin 24.3.1 drivers on Windows 10 22H2

Settings if relevant:
{"model": null, "model_param": "[Redacted L3 8B Model].Q5_K_M.gguf", "port": 5001, "port_param": 5001, "host": "127.0.0.1", "launch": false, "config": null, "threads": 6, "usecublas": ["normal", "0"], "usevulkan": null, "useclblast": null, "noblas": false, "contextsize": 8192, "gpulayers": 33, "tensor_split": null, "checkforupdates": false, "ropeconfig": [0.0, 10000.0], "blasbatchsize": 1024, "blasthreads": null, "lora": null, "noshift": false, "nommap": false, "usemlock": false, "noavx2": false, "debugmode": 0, "skiplauncher": false, "onready": "", "benchmark": null, "multiuser": 1, "remotetunnel": false, "highpriority": true, "foreground": false, "preloadstory": null, "quiet": true, "ssl": null, "nocertify": false, "mmproj": "LLaMA3-8B_mmproj-Q4_1.gguf", "password": null, "ignoremissing": false, "chatcompletionsadapter": null, "flashattention": true, "forceversion": 0, "smartcontext": false, "hordemodelname": "[Redacted]", "hordeworkername": "[Redacted]", "hordekey": "[Redacted]", "hordemaxctx": 8192, "hordegenlen": 1024, "sdmodel": "[Redacted Pruned SD1.5 2GB fp16 Model].safetensors", "sdthreads": 6, "sdclamped": true, "sdvae": "", "sdvaeauto": false, "sdquant": true, "sdlora": "", "sdloramult": 1.0, "hordeconfig": null, "sdconfig": null}

Mainly using this with sillytavern, captioning works in kobold lite, but not sillytavern, for some reason its trying to use "/api/openai/caption-image" with "KoboldCpp" selected dunno if its just wrong or they never updated it, rewriting that to "/sdapi/v1/interrogate" gives consistently inaccurate results, so i just rewrote "/api/extra/caption" to "/sdapi/v1/interrogate" for "Local", seems to be mostly accurate, not really sure why

Trying to generate an image works in neither, adds an "Unavailable" placeholder in kobold lite than i cant delete in any way other than clicking back, or creating a new chat/story, and sillytavern through its "SD.Next (Vladmandic)" preset actually starts(?) generating the image, but it hangs, no cpu usage, no gpu usage, idle gpu clocks and power, no errors or anything in the console, just ends on the "Generating (20 st.)" and is completely unresponsive.

When lowering steps to something like 5, it throws an access violation and an error, then continues generating text like nothing happened:

ImgGen: Clamped Mode (For Shared Use). Step counts and resolution are clamped.

Generating (5 st.)
exception: access violation reading 0x0000039C00000000
Generate Image: The response could not be sent, maybe connection was terminated?

Performance (Generate, not Process or Total) bounces between 25ms/T and 165ms/T, not very consistent, but seems slightly higher on average than before (ive been using one of the versions from march)

0 replies

lemon07r · 2024-05-28T00:37:26Z

lemon07r
May 28, 2024

Processing Prompt [BLAS] (107 / 107 tokens)
Generating (22 / 150 tokens)exception: access violation reading 0x00000006CE811000

This is with a phi medium model at q6_k on a 6900xt, fully loaded. This is on the latest release, 1.66, all the default settings.

0 replies

Nazosan · 2024-05-28T18:15:35Z

Nazosan
May 28, 2024

For the benefit of those who aren't in on the Discord discussion thread, all these things that cause crashes only happen in the Windows version just as a FYI. A lot of it seems to be related to upstream changes that possibly just aren't compatible with the older ROCm version 5.7 for Windows (whereas the Linux version is 6.0 or 6.1 depending on how up to date.) In Windows FlashAttention seems to prevent several crashes (and saves a tiny bit of VRAM -- with better savings as context increases.) But FlashAttention costs a huge amount of performance in prompt processing with an increase in token generation (so whether it's better or worse depends on usage and will vary) and FlashAttention is awful if anything less than 100% of a model's layers are loaded onto the GPU (performance even on generation drops to about 50% with a partial offload even if you only put one layer on the CPU.) In Linux even MoE models seem to load fine without FlashAttention.

Despite the instructions on the readme page, mmq on seems to produce worse results in just about everything. (And it's usually better to offload fewer layers than to use the lowvram option.) And @lemon07r Deagle on Discord suggests that mmq on seems to cause the crash with that model, so try turning it off and see what happens. mmq may be optimized more for nVidia than for AMD or maybe it changed for the worse in newer versions something because the results just aren't good. Though one person reported ever so slightly higher results on a 6600, so YMMV I guess, but generally assume mmq should be off.

Don't forget you can always run the benchmark. This works best via the command prompt with --benchmark "benchmark.csv" on the command line so it saves to a file. You can save all the settings in the GUI to a .kcpps file and open that in the prompt with --config filename.kcpps and you can edit that in a text editor (notepad or whatever -- I suggest setting Explorer to not hide file extensions as this has major repercussions across the entirety of Windows and should never have been on by default) to change benchmark from being set to null to "benchmark.csv" (the quotes are important!) then you can just run that .kcpps directly every time to benchmark something.

2 replies

waynekwal Jun 4, 2024

This is great info, thanks! Does anyone know if there is a roadmap to updating windows ROCm so it can be moved upstream?

YellowRoseCx Jun 5, 2024
Maintainer

This is great info, thanks! Does anyone know if there is a roadmap to updating windows ROCm so it can be moved upstream?

We just have to wait for AMD to release it

LIAM1543 · 2024-06-25T17:56:49Z

LIAM1543
Jun 25, 2024

I'm trying to figure this out, downloaded nearly a dozen versions for Windows, nothing worked, all giving me the dreaded "OSError: exception: access violation writing 0x0000000000000010" no matter what I do when trying to use hipBLAS, but after many downloads trying to find an older version that worked, I landed on v1.55.yr0, it allowed me to load small models that fit entirely on my GPU while using hipBLAS, yay.

And then the bad part. The whole reason I use koboldcpp is to load larger models than what can fit on my GPU (Goliath in this case), but if I try to do so using hipBLAS on v1.55.yr0, it then does the same thing as it did before, the access violation. I have tried every combination of every setting, reinstalling my drivers, switching between AMD Adrenaline and Pro (don't worry, I DDU'd first) and googled everything, no luck at all

Worth mentioning, on newer versions I can't even load small models like I can on v1.55.yr0, or I get the same access violation. Also worth mentioning, CLBlast works with everything, been using that since hipBLAS won't run, but of course, I want to run hipBLAS.

v1.55.yr0 Successful run with SciPhi.txt
v1.55.yr0 Unsuccessful run with NousCapybara.txt
v1.68.yr0 Unsuccessful run with SciPhi.txt
v1.68.yr0 Unsuccessful run with NousCapybara.txt
v1.68.yr0 Unsuccessful run with Rocket.txt

3 replies

Nazosan Jun 25, 2024

This is actually very strange. It was only MoEs that were crashing with that error message and that seems to have been fixed in the latest versions. That includes a test by at least one other 7900XTX user.

What I find interesting here is each of your access violations occur at an incredibly (impossibly?) low address and always the same. This was not the case with the other access violation crashes. I almost wonder if it could be a driver issue. At this point I might be tempted in your place to try doing an uninstall and complete cleaning, reinstall clean, and see what happens. If you're feeling brave, you could try the new ROCm for WSL beta drivers and use WSL if you're in Windows 11 (the drivers won't install on 10. It may actually work -- not sure -- but the only drivers they provide refuse in the installer itself to install on 10.) Other 7900XTX users have reported positive results running KoboldCPP_rocm in WSL with those drivers (and Linux users never got the access violation errors even back when they were happening with MoEs.) I suppose you could try flash attention just to see what happens, but I think your crash is different from those others anyway, so it may not do anything. (Plus flash attention is horrible if you don't offload all layers to GPU.)

It may be worth hopping on the discord and asking in the YellowRose rocm-kcpp thread (click on threads in the koboldcpp channel.) Though I doubt anyone can directly help with this one to be honest. At least YellowRose might more easily see it.

BTW you should probably turn mmq off. We can't find any verifiable point where it doesn't do exponentially more harm than simply offloading another layer or two. I don't think it's your issue though (it hasn't ever caused crashes before anyway.) Also, as a side note, I've heard more than 8 threads tends to actually hurt performance in KoboldCPP (at least if not running it fully on CPU.) You might try benchmarking with fewer threads and see what happens. Of course that only applies to if you ever get it working again.

LIAM1543 Jul 12, 2024

Unfortunately I cannot reset my PC yet, so I am unable to say anything on that front, however, in an attempt to just get off Windon't entirely, I have started dual booting Ubuntu, and hipBLAS works on it loading Goliath and everything else I normally run. Thank you for the suggestion.

If I do later on reinstall Windows, I'll update here if it works or not. If not, I'm gonna assume it's user error on my part, maybe some setting I changed.

Nazosan Jul 12, 2024

This particular crash should be gone on the Windows release. At least at this stage if it shows up again it's probably caused by something different instead. I dual boot and all the MoEs I tested worked fine in the Windows version now.

The Linux version always worked just fine. It never had this crash.

If you're seeing the crash still, it may be worth discussing on the discord in YellowRose's official ROCm build thread.

ROCm access violation #46

Replies: 13 comments · 24 replies

introvertedcreature May 2, 2024 Author

introvertedcreature May 14, 2024 Author

introvertedcreature May 14, 2024 Author

YellowRoseCx Jun 5, 2024 Maintainer

Replies: 13 comments 24 replies

introvertedcreature May 2, 2024
Author

introvertedcreature May 14, 2024
Author

introvertedcreature May 14, 2024
Author

YellowRoseCx Jun 5, 2024
Maintainer