Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SD1.5のlora学習の際に--network_train_unet_onlyを指定すると、unetのdown側(IN1〜IN8)のlora_up.weightの値が0になる #821

Closed
ruu2 opened this issue Sep 15, 2023 · 6 comments

Comments

@ruu2
Copy link

ruu2 commented Sep 15, 2023

SD1.5のlora学習の際、--network_train_unet_onlyを指定してunetだけトレーニングしたときに、unetのdown側(IN1〜IN8)のlora_up.weightの値が全て0になっている(零行列になっている)ことに気づきました。

私の理解では、loraは、二つに分解した行列を乗算して計算に使っていると思うのですが、ここで一方の行列が零だと、掛けた結果も零になり、結局このloraは、unetのdown側(IN1〜IN8)が何も学習されていないのと同じ状態になっているのかなと思います。

こちら側で検証してみた結果、どうも、--network_train_unet_onlyを--gradient_checkpointingと併用するとこの現象が発生するようでした。

検証に使った内容は以下の通りです。

#Google Colab T4にて

!git clone https://github.com/kohya-ss/sd-scripts
%cd sd-scripts
!pip install --upgrade -r requirements.txt
!pip install -U xformers

!accelerate config default --mixed_precision fp16

%cd /content
# https://note.com/kohya_ss/n/nb20c5187e15a から、lora_train_sample_pack.zipをダウンロードして
!unzip lora_train_sample_pack.zip -d /content

%cd /content/sd-scripts
!accelerate launch --num_cpu_threads_per_process 4 train_network.py \
    --gradient_checkpointing \
    --network_train_unet_only \
    --pretrained_model_name_or_path=emilianJR/AnyLORA \
    --train_data_dir=../train --reg_data_dir=../reg --prior_loss_weight=1.0 \
    --resolution 512 --output_dir=../lora_output --output_name=cjgg_frog \
    --train_batch_size=4 --learning_rate=1e-4 --max_train_epochs 4 \
    --use_8bit_adam --xformers --mixed_precision=fp16 --save_precision=fp16 \
    --seed 42 --save_model_as=safetensors --save_every_n_epochs=1 \
    --max_data_loader_n_workers=1 \
    --network_module=networks.lora --network_dim=4 \
    --training_comment="activate by usu frog"

from safetensors.torch import load_file
import torch

checkpoint_path="/content/lora_output/cjgg_frog.safetensors"

state_dict = load_file(checkpoint_path, device="cpu")
for key, value in state_dict.items():
    if "lora_up" in key:
        # 絶対値をとって最大値を出しているので、ここが0ということは行列全体が0なのでは?
        print(key, value.shape, torch.max(torch.abs(value)).item())

実行結果は以下の通りになります。

lora_unet_down_blocks_0_attentions_0_proj_in.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0
lora_unet_down_blocks_0_attentions_0_proj_out.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([2560, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_proj_in.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0
lora_unet_down_blocks_0_attentions_1_proj_out.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([2560, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_proj_in.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0
lora_unet_down_blocks_1_attentions_0_proj_out.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([5120, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_proj_in.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0
lora_unet_down_blocks_1_attentions_1_proj_out.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([5120, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0
lora_unet_down_blocks_2_attentions_0_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0
lora_unet_down_blocks_2_attentions_1_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_mid_block_attentions_0_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0272674560546875
lora_unet_mid_block_attentions_0_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.02471923828125
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.0181121826171875
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.025543212890625
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.0207366943359375
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.02838134765625
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.0257568359375
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.02239990234375
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.0242156982421875
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.0272979736328125
lora_unet_mid_block_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.0302276611328125
lora_unet_mid_block_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.027557373046875
lora_unet_up_blocks_1_attentions_0_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0255889892578125
lora_unet_up_blocks_1_attentions_0_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0206298828125
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.0223541259765625
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.02783203125
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.0206146240234375
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.033843994140625
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.024505615234375
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.024383544921875
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.023712158203125
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.0242462158203125
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.034210205078125
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.031524658203125
lora_unet_up_blocks_1_attentions_1_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.023651123046875
lora_unet_up_blocks_1_attentions_1_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.01959228515625
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.020721435546875
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0250244140625
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.0210113525390625
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.022216796875
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.0290985107421875
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0238800048828125
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.0305633544921875
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.0251922607421875
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.0382080078125
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.0191192626953125
lora_unet_up_blocks_1_attentions_2_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0205535888671875
lora_unet_up_blocks_1_attentions_2_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0255126953125
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.027801513671875
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0238494873046875
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.032012939453125
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.0250396728515625
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.0221405029296875
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0223388671875
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.017120361328125
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.0206756591796875
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.03936767578125
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.0195465087890625
lora_unet_up_blocks_2_attentions_0_proj_in.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0204925537109375
lora_unet_up_blocks_2_attentions_0_proj_out.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0206146240234375
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([640, 4]) 0.019622802734375
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([640, 4]) 0.02154541015625
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([640, 4]) 0.01837158203125
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([640, 4]) 0.0229644775390625
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([640, 4]) 0.0188140869140625
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0231170654296875
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([640, 4]) 0.0203094482421875
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([640, 4]) 0.0232696533203125
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([5120, 4]) 0.027252197265625
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([640, 4]) 0.0230865478515625
lora_unet_up_blocks_2_attentions_1_proj_in.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0254974365234375
lora_unet_up_blocks_2_attentions_1_proj_out.lora_up.weight torch.Size([640, 4, 1, 1]) 0.026580810546875
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([640, 4]) 0.0205230712890625
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([640, 4]) 0.02496337890625
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([640, 4]) 0.0167694091796875
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([640, 4]) 0.0270233154296875
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([640, 4]) 0.0216217041015625
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0238037109375
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([640, 4]) 0.0269927978515625
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([640, 4]) 0.02593994140625
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([5120, 4]) 0.03436279296875
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([640, 4]) 0.0246734619140625
lora_unet_up_blocks_2_attentions_2_proj_in.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0242156982421875
lora_unet_up_blocks_2_attentions_2_proj_out.lora_up.weight torch.Size([640, 4, 1, 1]) 0.031646728515625
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([640, 4]) 0.0247344970703125
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0229034423828125
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([640, 4]) 0.02459716796875
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([640, 4]) 0.0205535888671875
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([640, 4]) 0.03570556640625
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0267333984375
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([640, 4]) 0.034912109375
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([640, 4]) 0.0245513916015625
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([5120, 4]) 0.037506103515625
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([640, 4]) 0.033843994140625
lora_unet_up_blocks_3_attentions_0_proj_in.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0224609375
lora_unet_up_blocks_3_attentions_0_proj_out.lora_up.weight torch.Size([320, 4, 1, 1]) 0.027374267578125
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([320, 4]) 0.0154266357421875
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([320, 4]) 0.025543212890625
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([320, 4]) 0.018585205078125
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([320, 4]) 0.02374267578125
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([320, 4]) 0.020172119140625
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([320, 4]) 0.0242462158203125
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([320, 4]) 0.0149078369140625
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([320, 4]) 0.0281829833984375
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([2560, 4]) 0.0263519287109375
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([320, 4]) 0.030426025390625
lora_unet_up_blocks_3_attentions_1_proj_in.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0248565673828125
lora_unet_up_blocks_3_attentions_1_proj_out.lora_up.weight torch.Size([320, 4, 1, 1]) 0.025543212890625
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([320, 4]) 0.0166168212890625
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([320, 4]) 0.0282745361328125
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([320, 4]) 0.019744873046875
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([320, 4]) 0.01971435546875
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([320, 4]) 0.0212860107421875
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([320, 4]) 0.02886962890625
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([320, 4]) 0.017578125
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([320, 4]) 0.0234527587890625
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([2560, 4]) 0.028167724609375
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([320, 4]) 0.0362548828125
lora_unet_up_blocks_3_attentions_2_proj_in.lora_up.weight torch.Size([320, 4, 1, 1]) 0.031463623046875
lora_unet_up_blocks_3_attentions_2_proj_out.lora_up.weight torch.Size([320, 4, 1, 1]) 0.02984619140625
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([320, 4]) 0.0193023681640625
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([320, 4]) 0.028350830078125
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([320, 4]) 0.020172119140625
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([320, 4]) 0.024871826171875
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([320, 4]) 0.026885986328125
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([320, 4]) 0.02764892578125
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([320, 4]) 0.0237274169921875
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([320, 4]) 0.03302001953125
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([2560, 4]) 0.04229736328125
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([320, 4]) 0.04046630859375

--gradient_checkpointingの有無によって結果が変わるので、この部分の処理に何かしらの不具合があるのではないかと思います。

何か抜けや間違い、勘違い等がありましたらすみません。

@kohya-ss
Copy link
Owner

lora_upの重みの初期値は0のため、場合によっては0のままとなることは起こりえるようです(学習データが単純な場合、またはステップ数が少ない場合、fp16の場合など)。

こちらでは同一の条件(U-Netのみの学習、gradient checkpointing有効)でも、IN1~IN8は学習されているようでした。

可能なら、floatまたはbf16にしていただく、または他のデータで試すなどしてお試しいただけますでしょうか。

@ruu2
Copy link
Author

ruu2 commented Sep 25, 2023

確認ありがとうございます。こちらのほうでも追加で検証してみました。

まず、Google Colabのほうで、先に報告した例をfloatで試してみましたが、問題は再現しました。つまり、
--gradient_checkpointingを指定してunetのみで訓練した場合は、IN1〜IN8のlora_upのウェイトは全て0になり、
--gradient_checkpointingを指定せずunetのみで訓練した場合は、IN1〜IN8のlora_upのウェイトには非0の値が入るようでした。

次に、
https://zunko.jp/con_illust.html#illust_07000
から辿れる
https://drive.google.com/drive/folders/1BP3EEOIAf7K-AciY1QDkQGq_spqu0CXM
の東北ずん子の61枚の画像を使い、floatで試してしてみましたが、やはり問題は再現しました。

bf16では試していませんが、floatでも問題が再現したので、値の絶対値が小さ過ぎて0に押し潰されたわけではなさそうですし、訓練データが単純だったから零行列になったというわけでもなさそうに思われます。

今回報告している検証の範囲では、高々610ステップしか試していませんが、この問題自体は私が最近まで気付いていなかっただけで、私が初めてunetのみで訓練した今年の6月頭の時点で発生していたようでした。このとき概算で5000ステップほど実行していたようなので、ステップ数の問題でもないように思われます。なお、このときは、推論結果だけを見てもIN1〜IN8のlora_upの値が零になっていることには気付きませんでした。この5000ステップの例ではないのですが、
--gradient_checkpointingをオフにして問題を回避しても、生成されるloraの質があからさまに変化するわけでもないようで、loraを層別適用して検証しない限り、safetensorsの中身を直接見ないことには、この問題が発生していることには気付かないと思います。

一方で、kohyaさんのほうでは問題が再現しなかったということなので、環境の問題を疑い、まずGoogle Colabのほうで神経質にバージョンを合わせて確認してみました(pythonのバージョンを 3.10.6に合わせて、torch==1.12.1+cu116 torchvision==0.13.1+cu116 を使用し、xformersのバージョンを気にしたくないので--xformersを無効にし、bitsandbytesのバージョンを気にしたくないので--optimizer_type=AdamWを使いました)が、結果は変わらず、問題は再現しました。

Google Colabのほうではこれ以上できることはないと思い、仕方ないので、Windowsのほうでも検証してみました。
ただ、我が家のWindowsはVRAM性能が非力で、GPUでは実行できませんでしたので、CPUのほうを使って検証しましたが、この場合も結果は変わらず、問題は再現しました。

なお、このときは、accelerate config でCPU onlyを選び、 mixed precisionはnoにし、
--gradient_checkpointing有りのときは

accelerate launch --num_cpu_threads_per_process 4 train_network.py ^
  --gradient_checkpointing ^
  --network_train_unet_only ^
  --pretrained_model_name_or_path=emilianJR/AnyLORA ^
  --train_data_dir=./train  --prior_loss_weight=1.0 ^
  --resolution 512 --output_dir=./lora_output --output_name=cjgg_frog ^
  --train_batch_size=1 --learning_rate=1e-4 --max_train_epochs 4 ^
  --optimizer_type="AdamW" --mixed_precision=no --save_precision=float ^
  --seed 42 --save_model_as=safetensors --save_every_n_epochs=1 ^
  --max_data_loader_n_workers=1 ^
  --network_module=networks.lora --network_dim=4 ^
  --training_comment="activate by usu frog"

を実行し、--gradient_checkpointing無しのときは

accelerate launch --num_cpu_threads_per_process 4 train_network.py ^
  --network_train_unet_only ^
  --pretrained_model_name_or_path=emilianJR/AnyLORA ^
  --train_data_dir=./train  --prior_loss_weight=1.0 ^
  --resolution 512 --output_dir=./lora_output --output_name=cjgg_frog ^
  --train_batch_size=1 --learning_rate=1e-4 --max_train_epochs 4 ^
  --optimizer_type="AdamW" --mixed_precision=no --save_precision=float ^
  --seed 42 --save_model_as=safetensors --save_every_n_epochs=1 ^
  --max_data_loader_n_workers=1 ^
  --network_module=networks.lora --network_dim=4 ^
  --training_comment="activate by usu frog"

を実行しました。訓練データには最初の報告で使ったカエルの画像を1枚だけ使い、4ステップだけ実行しましたが、
--gradient_checkpointing有りの場合は、IN1〜IN8のlora_upの値が0になり、--gradient_checkpointing無しの場合は、IN1〜IN8のlora_upに非0の値が入りました。

結果も貼り付けておきます。--gradient_checkpointing有りの場合は、IN1〜IN8に相当するlora_unet_downで始まる行のテンソルの絶対値の最大値が0になっているにもかかわらず、--gradient_checkpointing無しの場合は、対応する個所が非0の値になっていることに着目ください。MIDおよびOUT3〜OUT11の値も載せています。lora_downのほうの値は載せていません。

--network_train_unet_only かつ --gradient_checkpointing有りの場合、

lora_unet_down_blocks_0_attentions_0_proj_in.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0
lora_unet_down_blocks_0_attentions_0_proj_out.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([2560, 4]) 0.0
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_proj_in.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0
lora_unet_down_blocks_0_attentions_1_proj_out.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([2560, 4]) 0.0
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([320, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_proj_in.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0
lora_unet_down_blocks_1_attentions_0_proj_out.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([5120, 4]) 0.0
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_proj_in.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0
lora_unet_down_blocks_1_attentions_1_proj_out.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([5120, 4]) 0.0
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([640, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0
lora_unet_down_blocks_2_attentions_0_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.0
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0
lora_unet_down_blocks_2_attentions_1_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.0
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.0
lora_unet_mid_block_attentions_0_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.00036185557837598026
lora_unet_mid_block_attentions_0_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.00035583777935244143
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.00032123090932145715
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.00036821336834691465
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.00031216975185088813
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.0003795026277657598
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.00036390256718732417
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.00036777640343643725
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.00032518431544303894
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.0003906030615326017
lora_unet_mid_block_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.00038883660454303026
lora_unet_mid_block_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.0003581437631510198
lora_unet_up_blocks_1_attentions_0_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.00037767706089653075
lora_unet_up_blocks_1_attentions_0_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.00038220194983296096
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.00034422139287926257
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0003663273237179965
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.00036557746352627873
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.0003712256730068475
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.000357157492544502
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0003768591268453747
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.000339333841111511
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.0003740468237083405
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.00037773532676510513
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.00034959721961058676
lora_unet_up_blocks_1_attentions_1_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0003689294680953026
lora_unet_up_blocks_1_attentions_1_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.00038502979441545904
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.0003331990446895361
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0003763235581573099
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.0003360706032253802
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.000365647574653849
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.0003542778140399605
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.00037658686051145196
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.00033763988176360726
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.00037374987732619047
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.00037000273005105555
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.000350741611327976
lora_unet_up_blocks_1_attentions_2_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.00036932603688910604
lora_unet_up_blocks_1_attentions_2_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0003794131916947663
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.00035218364791944623
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0003713464247994125
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.00035351974656805396
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.0003834139497485012
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.0003857198462355882
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.00038988509913906455
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.00037257076473906636
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.00038148468593135476
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.0003714245976880193
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.00035136332735419273
lora_unet_up_blocks_2_attentions_0_proj_in.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0003558371972758323
lora_unet_up_blocks_2_attentions_0_proj_out.lora_up.weight torch.Size([640, 4, 1, 1]) 0.00036457765963859856
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([640, 4]) 0.0003575762966647744
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0003655114269349724
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([640, 4]) 0.0003622748772613704
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([640, 4]) 0.0003575003065634519
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([640, 4]) 0.000308857299387455
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([640, 4]) 0.00034954390139319
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([640, 4]) 0.00028865726199001074
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([640, 4]) 0.0003499282756820321
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([5120, 4]) 0.0003832595539279282
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([640, 4]) 0.00034167003468610346
lora_unet_up_blocks_2_attentions_1_proj_in.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0003634687454905361
lora_unet_up_blocks_2_attentions_1_proj_out.lora_up.weight torch.Size([640, 4, 1, 1]) 0.00037907989462837577
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([640, 4]) 0.0003558970056474209
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0003716212522704154
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([640, 4]) 0.0003434013924561441
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([640, 4]) 0.00035716136335395277
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([640, 4]) 0.0003223948588129133
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([640, 4]) 0.00036272488068789244
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([640, 4]) 0.00032365735387429595
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([640, 4]) 0.00036831985926255584
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([5120, 4]) 0.0003924330521840602
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([640, 4]) 0.00034591183066368103
lora_unet_up_blocks_2_attentions_2_proj_in.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0003737954830285162
lora_unet_up_blocks_2_attentions_2_proj_out.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0003722494875546545
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([640, 4]) 0.0003753896744456142
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([640, 4]) 0.00035797178861685097
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([640, 4]) 0.0003556490992195904
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([640, 4]) 0.00037642233655788004
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([640, 4]) 0.00037785188760608435
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([640, 4]) 0.00038810016121715307
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([640, 4]) 0.0003705632989294827
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([640, 4]) 0.0003780625993385911
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([5120, 4]) 0.0003725170681718737
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([640, 4]) 0.0003592813154682517
lora_unet_up_blocks_3_attentions_0_proj_in.lora_up.weight torch.Size([320, 4, 1, 1]) 0.00036690643173642457
lora_unet_up_blocks_3_attentions_0_proj_out.lora_up.weight torch.Size([320, 4, 1, 1]) 0.00037337298272177577
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([320, 4]) 0.00033812999026849866
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([320, 4]) 0.00034423256875015795
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([320, 4]) 0.00034116534516215324
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([320, 4]) 0.00034960766788572073
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([320, 4]) 0.0003428072086535394
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([320, 4]) 0.0003485977940727025
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([320, 4]) 0.00034521662746556103
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([320, 4]) 0.00034382572630420327
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([2560, 4]) 0.00037084711948409677
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([320, 4]) 0.0003401917056180537
lora_unet_up_blocks_3_attentions_1_proj_in.lora_up.weight torch.Size([320, 4, 1, 1]) 0.00036823618574999273
lora_unet_up_blocks_3_attentions_1_proj_out.lora_up.weight torch.Size([320, 4, 1, 1]) 0.00037233284092508256
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([320, 4]) 0.00034028725349344313
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([320, 4]) 0.00035911801387555897
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([320, 4]) 0.00035090954042971134
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([320, 4]) 0.00035937080974690616
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([320, 4]) 0.000327248708344996
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([320, 4]) 0.0003511085524223745
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([320, 4]) 0.0003417452098801732
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([320, 4]) 0.0003586155653465539
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([2560, 4]) 0.0003897074784617871
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([320, 4]) 0.00037334946682676673
lora_unet_up_blocks_3_attentions_2_proj_in.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0003962226619478315
lora_unet_up_blocks_3_attentions_2_proj_out.lora_up.weight torch.Size([320, 4, 1, 1]) 0.00039170763920992613
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([320, 4]) 0.0003676573687698692
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([320, 4]) 0.0003919925366062671
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([320, 4]) 0.0003554837894625962
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([320, 4]) 0.00038323376793414354
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([320, 4]) 0.0003599445044528693
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([320, 4]) 0.0003890149528160691
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([320, 4]) 0.0003552730195224285
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([320, 4]) 0.0003881815355271101
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([2560, 4]) 0.00038900101208128035
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([320, 4]) 0.0003791723574977368

--network_train_unet_only かつ --gradient_checkpointing無しの場合、

lora_unet_down_blocks_0_attentions_0_proj_in.lora_up.weight torch.Size([320, 4, 1, 1]) 0.00035000924253836274
lora_unet_down_blocks_0_attentions_0_proj_out.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0003771119809243828
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([320, 4]) 0.000377399061108008
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([320, 4]) 0.00037607611739076674
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([320, 4]) 0.0003600403724703938
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([320, 4]) 0.00039541092701256275
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([320, 4]) 0.0003454448306001723
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([320, 4]) 0.0003759291139431298
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([320, 4]) 0.00033264741068705916
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([320, 4]) 0.000366412743460387
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([2560, 4]) 0.00037972122663632035
lora_unet_down_blocks_0_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([320, 4]) 0.0003755180223379284
lora_unet_down_blocks_0_attentions_1_proj_in.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0003610811836551875
lora_unet_down_blocks_0_attentions_1_proj_out.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0003704876871779561
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([320, 4]) 0.0003411458746995777
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([320, 4]) 0.0003604711382649839
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([320, 4]) 0.000344800588209182
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([320, 4]) 0.00036430361797101796
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([320, 4]) 0.00033463811269029975
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([320, 4]) 0.0003748398448806256
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([320, 4]) 0.0003482129250187427
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([320, 4]) 0.0003717427607625723
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([2560, 4]) 0.00038511911407113075
lora_unet_down_blocks_0_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([320, 4]) 0.0003468696086201817
lora_unet_down_blocks_1_attentions_0_proj_in.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0003666365519165993
lora_unet_down_blocks_1_attentions_0_proj_out.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0003680391237139702
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([640, 4]) 0.000348083907738328
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([640, 4]) 0.000381159974494949
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([640, 4]) 0.0003378246328793466
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([640, 4]) 0.0003894024412147701
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([640, 4]) 0.00036201014881953597
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0003636722103692591
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([640, 4]) 0.00033681586501188576
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([640, 4]) 0.00035925384145230055
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([5120, 4]) 0.0003763156710192561
lora_unet_down_blocks_1_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([640, 4]) 0.0003411587094888091
lora_unet_down_blocks_1_attentions_1_proj_in.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0003655209729913622
lora_unet_down_blocks_1_attentions_1_proj_out.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0003952115075662732
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([640, 4]) 0.0003388429759070277
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0003695542400237173
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([640, 4]) 0.00034141287324018776
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([640, 4]) 0.00038011331344023347
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([640, 4]) 0.0003625397803261876
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([640, 4]) 0.00037728846655227244
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([640, 4]) 0.00034112174762412906
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([640, 4]) 0.00036503581213764846
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([5120, 4]) 0.00036841107066720724
lora_unet_down_blocks_1_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([640, 4]) 0.0003431103250477463
lora_unet_down_blocks_2_attentions_0_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0003629806451499462
lora_unet_down_blocks_2_attentions_0_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0003915571141988039
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.0003323135315440595
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.00039108871715143323
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.0003320874529890716
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.0003772812196984887
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.0003591840504668653
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.00038109885645098984
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.0003405098686926067
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.00038695248076692224
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.0003810965863522142
lora_unet_down_blocks_2_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.0003632633888628334
lora_unet_down_blocks_2_attentions_1_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.00038418310577981174
lora_unet_down_blocks_2_attentions_1_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.00037535844603553414
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.00033975348924286664
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0003616191679611802
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.0003394156228750944
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.00038703554309904575
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.0003802468127105385
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0003843651502393186
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.00034197058994323015
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.0003816100943367928
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.0003826729953289032
lora_unet_down_blocks_2_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.000345188076607883
lora_unet_mid_block_attentions_0_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0003617364855017513
lora_unet_mid_block_attentions_0_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0003559306787792593
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.00032103052944876254
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0003681328671518713
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.00031223444966599345
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.0003794568183366209
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.0003638354246504605
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.00036781199742108583
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.00032433855812996626
lora_unet_mid_block_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.0003904931072611362
lora_unet_mid_block_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.0003888393403030932
lora_unet_mid_block_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.00035801282501779497
lora_unet_up_blocks_1_attentions_0_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.00037751049967482686
lora_unet_up_blocks_1_attentions_0_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.00038169207982718945
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.0003438650746829808
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.00036676073796115816
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.000365308893378824
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.0003712676407303661
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.00035712873796001077
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.00037693354533985257
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.00033933823578990996
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.0003737840452231467
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.00037765403976663947
lora_unet_up_blocks_1_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.00034971011336892843
lora_unet_up_blocks_1_attentions_1_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0003693469043355435
lora_unet_up_blocks_1_attentions_1_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0003849162021651864
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.00033328490098938346
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.0003763809218071401
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.00033616021391935647
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.00036608066875487566
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.00035451422445476055
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.00037665999843738973
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.0003374690131749958
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.0003738179220817983
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.000369932793546468
lora_unet_up_blocks_1_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.0003506466746330261
lora_unet_up_blocks_1_attentions_2_proj_in.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.00036920298589393497
lora_unet_up_blocks_1_attentions_2_proj_out.lora_up.weight torch.Size([1280, 4, 1, 1]) 0.0003793858631979674
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([1280, 4]) 0.00035203414154239
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.00037136138416826725
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([1280, 4]) 0.000353498209733516
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([1280, 4]) 0.000383460836019367
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([1280, 4]) 0.000385990715585649
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([1280, 4]) 0.00038967985892668366
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([1280, 4]) 0.0003722819674294442
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([1280, 4]) 0.0003819598350673914
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([10240, 4]) 0.0003713000623974949
lora_unet_up_blocks_1_attentions_2_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([1280, 4]) 0.0003513301780913025
lora_unet_up_blocks_2_attentions_0_proj_in.lora_up.weight torch.Size([640, 4, 1, 1]) 0.00035596557427197695
lora_unet_up_blocks_2_attentions_0_proj_out.lora_up.weight torch.Size([640, 4, 1, 1]) 0.00036455647205002606
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([640, 4]) 0.000357579265255481
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([640, 4]) 0.00036386007559485734
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([640, 4]) 0.0003622189105954021
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([640, 4]) 0.0003574440779630095
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([640, 4]) 0.000308840477373451
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([640, 4]) 0.00034946296364068985
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([640, 4]) 0.000288636569166556
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([640, 4]) 0.00034988191328011453
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([5120, 4]) 0.00038327579386532307
lora_unet_up_blocks_2_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([640, 4]) 0.00034182003582827747
lora_unet_up_blocks_2_attentions_1_proj_in.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0003633967717178166
lora_unet_up_blocks_2_attentions_1_proj_out.lora_up.weight torch.Size([640, 4, 1, 1]) 0.000378972792532295
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([640, 4]) 0.00035586010199040174
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0003714319027494639
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([640, 4]) 0.00034315945231355727
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([640, 4]) 0.0003563177306205034
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([640, 4]) 0.00032239416032098234
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([640, 4]) 0.00036273134173825383
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([640, 4]) 0.0003236671327613294
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([640, 4]) 0.0003683438990265131
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([5120, 4]) 0.0003924337215721607
lora_unet_up_blocks_2_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([640, 4]) 0.0003458849969319999
lora_unet_up_blocks_2_attentions_2_proj_in.lora_up.weight torch.Size([640, 4, 1, 1]) 0.0003738765371963382
lora_unet_up_blocks_2_attentions_2_proj_out.lora_up.weight torch.Size([640, 4, 1, 1]) 0.00037242445978336036
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([640, 4]) 0.00037517331656999886
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([640, 4]) 0.0003578221658244729
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([640, 4]) 0.00035568844759836793
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([640, 4]) 0.00037668325239792466
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([640, 4]) 0.00037766669993288815
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([640, 4]) 0.00038809471880085766
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([640, 4]) 0.00037058230373077095
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([640, 4]) 0.0003779935068450868
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([5120, 4]) 0.00037231389433145523
lora_unet_up_blocks_2_attentions_2_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([640, 4]) 0.00035922424285672605
lora_unet_up_blocks_3_attentions_0_proj_in.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0003667009877972305
lora_unet_up_blocks_3_attentions_0_proj_out.lora_up.weight torch.Size([320, 4, 1, 1]) 0.00037345924647524953
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([320, 4]) 0.0003381232963874936
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([320, 4]) 0.000344164262060076
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([320, 4]) 0.00034112256253138185
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([320, 4]) 0.0003495448618195951
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([320, 4]) 0.00034283450804650784
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([320, 4]) 0.00034895926364697516
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([320, 4]) 0.00034550693817436695
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([320, 4]) 0.000343824562150985
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([2560, 4]) 0.0003708489239215851
lora_unet_up_blocks_3_attentions_0_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([320, 4]) 0.0003401975554879755
lora_unet_up_blocks_3_attentions_1_proj_in.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0003682337992358953
lora_unet_up_blocks_3_attentions_1_proj_out.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0003723211120814085
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([320, 4]) 0.0003402938600629568
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([320, 4]) 0.00035909557482227683
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([320, 4]) 0.0003507696383167058
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([320, 4]) 0.0003593892906792462
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([320, 4]) 0.00032734524575062096
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([320, 4]) 0.00035111591569148004
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([320, 4]) 0.0003414647071622312
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([320, 4]) 0.0003586857346817851
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([2560, 4]) 0.0003898872237186879
lora_unet_up_blocks_3_attentions_1_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([320, 4]) 0.0003731420438271016
lora_unet_up_blocks_3_attentions_2_proj_in.lora_up.weight torch.Size([320, 4, 1, 1]) 0.0003960947215091437
lora_unet_up_blocks_3_attentions_2_proj_out.lora_up.weight torch.Size([320, 4, 1, 1]) 0.00039179367013275623
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn1_to_k.lora_up.weight torch.Size([320, 4]) 0.00036762055242434144
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn1_to_out_0.lora_up.weight torch.Size([320, 4]) 0.00039198383456096053
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn1_to_q.lora_up.weight torch.Size([320, 4]) 0.0003554147551767528
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn1_to_v.lora_up.weight torch.Size([320, 4]) 0.00038312573451548815
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn2_to_k.lora_up.weight torch.Size([320, 4]) 0.00035997387021780014
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn2_to_out_0.lora_up.weight torch.Size([320, 4]) 0.00038902624510228634
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn2_to_q.lora_up.weight torch.Size([320, 4]) 0.0003552096022758633
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_attn2_to_v.lora_up.weight torch.Size([320, 4]) 0.00038819567998871207
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_ff_net_0_proj.lora_up.weight torch.Size([2560, 4]) 0.00038897243211977184
lora_unet_up_blocks_3_attentions_2_transformer_blocks_0_ff_net_2.lora_up.weight torch.Size([320, 4]) 0.00037899165181443095

以上をまとめますと、私が確認した限り(Google Colab T4およびWindowsのCPUでの実行の範囲では)、
--network_train_unet_onlyを指定して実行した際、

  • --gradient_checkpointing有りの場合はIN1〜IN8のlora_upのウェイトが全て0になり、
  • --gradient_checkpointing無しの場合はIN1〜IN8のlora_upのウェイトに非0の値が入る

という結果が再現し、それはfp16の問題や、訓練データの問題、ステップ数の問題ではないように思われますし、環境の問題だとも言い切れないように思います。

ただ、kohyaさんのほうで問題が再現されないというのであれば、私は--network_train_unet_onlyを--gradient_checkpointingと併用しさえすれば問題が再現すると思ったのですが、もしかしたら条件の指定が不足しているのかもしれません。

以下、雑多な補足情報です。

  • 問題が起こる場合でも、(lora_upのほうではなく)lora_downのほうには非0の値が入っているようでした。
  • SDXLのほうでは、私が数回試した限りでは、同種の問題は起きていないようでした。

@kohya-ss
Copy link
Owner

詳細に追加検証いただき、ありがとうございます。再度確認したところ手元のSDXL版のLoRAで検証しておりました。失礼いたしました。時間が取れ次第、SD1.5で同一条件で確認しいたします。

お手数をお掛けし、大変申し訳ありません。よろしくお願いいたします。

@kohya-ss
Copy link
Owner

kohya-ss commented Oct 1, 2023

SD1.5で確認したところ、お書きいただいた事象が再現できました。ご確認にお手を煩わせてしまい申し訳ありませんでした。

まったく学習できないか、エラーが発生するならまだしも、IN1〜IN8だけ学習できない原因は不明ですが、U-Netの最初のパラメータに対してrequires_grad_(True)とすることで、これらの層についても学習できることも確認しました。

mainブランチを更新することも可能ですが、sdxlブランチに対して修正を行い、mainブランチにマージすることで対応したいと思います。

よろしくお願いいたします。

@kohya-ss
Copy link
Owner

kohya-ss commented Oct 9, 2023

#846 で対応し、mainにもマージ済みです。ご確認いただければ幸いです。報告が遅くなり申し訳ありません。よろしくお願いいたします。

@ruu2
Copy link
Author

ruu2 commented Oct 9, 2023

最初に報告した例について、問題が解消されていることを確認しました。対応ありがとうございました。

@ruu2 ruu2 closed this as completed Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants