-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Not running on GPU even though bitsandbytes returns no errors #2770
Replies: 1 comment · 5 replies
-
You can ignore those warnings about "cuda drivers" from tensorflow... they're not relevant for training. When you start a training run, can you check the log output? Does it say "accelerator device: cuda" or "accelerator device: cpu" |
Beta Was this translation helpful? Give feedback.
All reactions
-
Not sure what logfile you are talking about, I find nothing of that text in any of the logfiles created when I tried.
Some info on things:
And looking through log files searching for "accelerate" I for example find this in the setup.log:
If you direct me to exactly what logfile you want me to look in, let me know and I will provide the data. |
Beta Was this translation helpful? Give feedback.
All reactions
-
The terminal output would be the "log" I'm referring to. When you start a training run, there should be one line that says Here's an example of my terminal. Does yours say cuda or cpu? |
Beta Was this translation helpful? Give feedback.
All reactions
-
Thank you for responding. So, I finally got time to play around with this again.
And then it just stays there, with the cpu fan going like crazy. And now, when I try after a reboot, I mistakenly forgot to change to lora so dreambooth training spit out this:
And then:
Memory at least is NOT utilized like stated in the log. I can verify that by watching nvtop. I have no clue what is going on here. :( |
Beta Was this translation helpful? Give feedback.
All reactions
-
That lora rank of 100000 is way too high. Choose something reasonable between 16 to 256.
Dreambooth with SDXL will require "Full fp16 training" and "Gradiented Checkpointing" to avoid out-of-memory errors. |
Beta Was this translation helpful? Give feedback.
All reactions
-
There are probably 100 things wrong with the configs, for example no descriptions on images whatsoever. I pay no attention to that stuff before I get the application to actually run on my GPU. I haven't mentioned this, but I can observe my CPU utilizing cores at 100% if I use htop. I guess this it not made to work on any other os than the spyware windows11, and maybe corporate versions of linux... :( |
Beta Was this translation helpful? Give feedback.
-
I am using arch so getting older versions of cuda natively on my machine is more or less out of the question. But I got it "working" by following this tread (link to comments with solution): #2651 (comment)
I set up a pyenv for 3.10.10.
I installed nvidia-cusparse-cu11 nvidia-cublas-cu11 nvidia-cuda_runtime-cu11 using pip and add them to $LD_LIBRARY_PATH
And confirming with
python -m bitsanbytes
tells me it is working:But when I start the webui I get this:
The webui works fine, but when starting to train, it ofc only runs on CPU.
I see others asking for help regarding similar problems (for example: link1 & link2), but I figured I ask again and provide info that bitsandbytes returns it should work.
Any ideas how to fix this would be highly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions