ROCm access violation #46
Replies: 13 comments 24 replies
-
I found the same with any model i tried. I went back to 1.61.2 and function returned to normal. |
Beta Was this translation helpful? Give feedback.
-
Yeah newest version is unusable for me right now. |
Beta Was this translation helpful? Give feedback.
-
same on 7900xtx |
Beta Was this translation helpful? Give feedback.
-
Same with me, access violation. Direct in the Kobold UI or through ST, if that matters at all. |
Beta Was this translation helpful? Give feedback.
-
"exception: access violation reading 0x00000007F115C100" |
Beta Was this translation helpful? Give feedback.
-
Update 1.64.1, still same error. |
Beta Was this translation helpful? Give feedback.
-
Same question, v1.64.1 not works on my 6600xt. |
Beta Was this translation helpful? Give feedback.
-
V1.64.1 - Same issue, access violation on BLAS prompt. Tried various configurations with no success. |
Beta Was this translation helpful? Give feedback.
-
Update 1.65, still same error. @YellowRoseCx Any idea why this is happening? |
Beta Was this translation helpful? Give feedback.
-
Don't see anyone saying anything about the new version, so...Using v1.66.1.yr1-ROCm with a RX 7800 XT on Adrenalin 24.3.1 drivers on Windows 10 22H2 Settings if relevant: Mainly using this with sillytavern, captioning works in kobold lite, but not sillytavern, for some reason its trying to use "/api/openai/caption-image" with "KoboldCpp" selected dunno if its just wrong or they never updated it, rewriting that to "/sdapi/v1/interrogate" gives consistently inaccurate results, so i just rewrote "/api/extra/caption" to "/sdapi/v1/interrogate" for "Local", seems to be mostly accurate, not really sure why Trying to generate an image works in neither, adds an "Unavailable" placeholder in kobold lite than i cant delete in any way other than clicking back, or creating a new chat/story, and sillytavern through its "SD.Next (Vladmandic)" preset actually starts(?) generating the image, but it hangs, no cpu usage, no gpu usage, idle gpu clocks and power, no errors or anything in the console, just ends on the "Generating (20 st.)" and is completely unresponsive. When lowering steps to something like 5, it throws an access violation and an error, then continues generating text like nothing happened:
Performance (Generate, not Process or Total) bounces between 25ms/T and 165ms/T, not very consistent, but seems slightly higher on average than before (ive been using one of the versions from march) |
Beta Was this translation helpful? Give feedback.
-
Processing Prompt [BLAS] (107 / 107 tokens) This is with a phi medium model at q6_k on a 6900xt, fully loaded. This is on the latest release, 1.66, all the default settings. |
Beta Was this translation helpful? Give feedback.
-
For the benefit of those who aren't in on the Discord discussion thread, all these things that cause crashes only happen in the Windows version just as a FYI. A lot of it seems to be related to upstream changes that possibly just aren't compatible with the older ROCm version 5.7 for Windows (whereas the Linux version is 6.0 or 6.1 depending on how up to date.) In Windows FlashAttention seems to prevent several crashes (and saves a tiny bit of VRAM -- with better savings as context increases.) But FlashAttention costs a huge amount of performance in prompt processing with an increase in token generation (so whether it's better or worse depends on usage and will vary) and FlashAttention is awful if anything less than 100% of a model's layers are loaded onto the GPU (performance even on generation drops to about 50% with a partial offload even if you only put one layer on the CPU.) In Linux even MoE models seem to load fine without FlashAttention. Despite the instructions on the readme page, mmq on seems to produce worse results in just about everything. (And it's usually better to offload fewer layers than to use the lowvram option.) And @lemon07r Deagle on Discord suggests that mmq on seems to cause the crash with that model, so try turning it off and see what happens. mmq may be optimized more for nVidia than for AMD or maybe it changed for the worse in newer versions something because the results just aren't good. Though one person reported ever so slightly higher results on a 6600, so YMMV I guess, but generally assume mmq should be off. Don't forget you can always run the benchmark. This works best via the command prompt with --benchmark "benchmark.csv" on the command line so it saves to a file. You can save all the settings in the GUI to a .kcpps file and open that in the prompt with --config filename.kcpps and you can edit that in a text editor (notepad or whatever -- I suggest setting Explorer to not hide file extensions as this has major repercussions across the entirety of Windows and should never have been on by default) to change benchmark from being set to null to "benchmark.csv" (the quotes are important!) then you can just run that .kcpps directly every time to benchmark something. |
Beta Was this translation helpful? Give feedback.
-
I'm trying to figure this out, downloaded nearly a dozen versions for Windows, nothing worked, all giving me the dreaded "OSError: exception: access violation writing 0x0000000000000010" no matter what I do when trying to use hipBLAS, but after many downloads trying to find an older version that worked, I landed on v1.55.yr0, it allowed me to load small models that fit entirely on my GPU while using hipBLAS, yay. And then the bad part. The whole reason I use koboldcpp is to load larger models than what can fit on my GPU (Goliath in this case), but if I try to do so using hipBLAS on v1.55.yr0, it then does the same thing as it did before, the access violation. I have tried every combination of every setting, reinstalling my drivers, switching between AMD Adrenaline and Pro (don't worry, I DDU'd first) and googled everything, no luck at all Worth mentioning, on newer versions I can't even load small models like I can on v1.55.yr0, or I get the same access violation. Also worth mentioning, CLBlast works with everything, been using that since hipBLAS won't run, but of course, I want to run hipBLAS. v1.55.yr0 Successful run with SciPhi.txt |
Beta Was this translation helpful? Give feedback.
-
Llama 3 models throw access violation errors after a few hundred tokens are reached. Running on 16GB Radeon 6800, so vram shouldn't be the problem. When switching to Vulkan, this does not happen. Maybe I'm missing something in the settings?
@YellowRoseCx answer below:
"Ive been looking into it for weeks and the only thing I can think of is that llama.cpp changed some code thats now incompatible with Rocm 5.7... the biggest change I know of that happened between the last working version and the one that first broke is Llama.cpp changed the ggml-cuda.cu file and broke it up into like 30 different files and did who knows what else to it
I'm really sorry I haven't been able to figure it out and fix it, yet.
If anyone is able try and help narrow it down, it would be extremely appreciated"
Beta Was this translation helpful? Give feedback.
All reactions