CUDA only 40-50% used during SDXL training #1304

brianiup · 2023-07-31T12:15:27Z

brianiup
Jul 31, 2023

I'm trying to train a LORA for the base SDXL 1.0 model, I can't seem to get my CUDA usage above 50%, is there a reason for this? I have the CUDNN libraries that are recommended installed, Kohya is at the latest release was a completely new Git pull, configured like normal for windows, all local training all GPU based. I just tried increasing the number of data load workers and didn't make a difference

my accelerate config used was:

accelerate launch --num_cpu_threads_per_process=2 "./sdxl_train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="E:/Models/Stable-Diffusion/Checkpoints/SDXL1.0/sd_xl_base_1.0.safetensors" --train_data_dir="E:/Models/Stable-Diffusion/Training/Lora/Blocks-XL\img" --resolution="1024,1024" --output_dir="E:/Models/Stable-Diffusion/Training/Lora/Blocks-XL\model" --logging_dir="E:/Models/Stable-Diffusion/Training/Lora/Blocks-XL\log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=0.0004 --unet_lr=0.0004 --network_dim=256 --output_name="Blocks-XL" --lr_scheduler_num_cycles="10" --no_half_vae --learning_rate="0.0004" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="17800" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --seed="12345" --cache_latents --cache_latents_to_disk --optimizer_type="Adafactor" --optimizer_args scale_parameter=False relative_step=False warmup_init=False --max_data_loader_n_workers="48" --bucket_reso_steps=64 --save_state --gradient_checkpointing --xformers --bucket_no_upscale --noise_offset=0.0357 --sample_sampler=euler_a --sample_prompts="E:/Models/Stable-Diffusion/Training/Lora/Blocks-XL\model\sample\prompt.txt" --sample_every_n_steps="500"

My system is pretty strong, not massive but it should be able to use all the CUDA cores up.

Intel 13900K CPU
128GB DDR5 RAM
nVIDIA 4090
4x 2TB Gen 5 nVME SSD's

brianiup · 2023-07-31T12:39:00Z

brianiup
Jul 31, 2023
Author

In the above config --max_data_loader_n_workers="48" was set to try to see if there was a data loading bottleneck, I tried other numbers also with no change

When training a SD1.5 LORA I can easily use 99% of the CUDA time up

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA only 40-50% used during SDXL training #1304

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

CUDA only 40-50% used during SDXL training #1304

brianiup Jul 31, 2023

Replies: 1 comment

brianiup Jul 31, 2023 Author

brianiup
Jul 31, 2023

brianiup
Jul 31, 2023
Author