Loss becomes nan/inf after a few steps regardless of learning rate #795

ghub29 · 2023-05-14T03:26:43Z

ghub29
May 14, 2023

Not able to get very far into training. I've been attempting to train LoRAs (standard), however the loss becomes nan, sometimes infinite regardless of learning rate. After around 300 steps the loss continues to increase and shortly after gets to nan, garbage samples are produced. e.g. training with 20 to 500 512x512 images & SD 1.5 model. Using an Nvidia 4090, fresh install of Ubuntu 22.04.1. Tried with fp16, bp16 and with/without xformers. Pulled the latest and re-ran setup. e.g. command:

accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --enable_bucket --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="/home/user/landscape/img" --resolution=512,512 --output_dir="/home/user/landscape/output" --logging_dir="/home/user/landscape/logs" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.001 --network_dim=128 --output_name="landscape01" --lr_scheduler_num_cycles="1" --learning_rate="0.00001" --lr_scheduler="constant" --train_batch_size="2" --max_train_steps="130" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="1" --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale

I haven't been able to find anything regarding this issue elsewhere.

vinsonliux · 2023-05-23T16:23:22Z

vinsonliux
May 23, 2023

I also had this problem, but I tried to lower network_dim, or lower the learning rate，It's working.!

0 replies

tornado73 · 2023-05-25T17:38:19Z

tornado73
May 25, 2023

-optimizer_type="AdamW8bit" change AdamW

0 replies

ghub29 · 2023-06-03T13:35:07Z

ghub29
Jun 3, 2023
Author

Thanks, might have been network dim, LR or optimizer type didn't have an effect. I ended up installing Windows and it's working without any issue.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss becomes nan/inf after a few steps regardless of learning rate #795

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Loss becomes nan/inf after a few steps regardless of learning rate #795

ghub29 May 14, 2023

Replies: 3 comments

vinsonliux May 23, 2023

tornado73 May 25, 2023

ghub29 Jun 3, 2023 Author

ghub29
May 14, 2023

vinsonliux
May 23, 2023

tornado73
May 25, 2023

ghub29
Jun 3, 2023
Author