Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'NoneType' object has no attribute 'view' when training flux-lora with loss_type = "smooth_l1" #565

Open
SunWintor opened this issue Nov 2, 2024 · 1 comment
Labels
invalid This doesn't seem right

Comments

@SunWintor
Copy link

Description:

I'm experiencing an issue when training a flux model using the lora-scripts project (version 1.9.0). The training process crashes with an AttributeError stating that 'NoneType' object has no attribute 'view' when using loss_type = "smooth_l1".

Error Message:

)QAWYDZFJC_I1SOB25MP8

Traceback (most recent call last):
File "A:\funny\flux_train\Lora训练-测试版\lora-scripts-v1.9.0\scripts\dev\flux_train_network.py", line 565, in
trainer.train(args)
File "A:\funny\flux_train\Lora训练-测试版\lora-scripts-v1.9.0\scripts\dev\train_network.py", line 1192, in train
loss = train_util.conditional_loss(
File "A:\funny\flux_train\Lora训练-测试版\lora-scripts-v1.9.0\scripts\dev\library\train_util.py", line 5815, in conditional_loss
huber_c = huber_c.view(-1, 1, 1, 1)
AttributeError: 'NoneType' object has no attribute 'view'

Training Parameters:

Here are the training parameters I'm using:

model_train_type = "flux-lora"
pretrained_model_name_or_path = "A:/funny/ConfyUI-aki/ComfyUI-aki-v1.3/models/unet/flux1-dev.sft"
ae = "A:/funny/ConfyUI-aki/ComfyUI-aki-v1.3/models/vae/ae.safetensors"
clip_l = "A:/funny/ConfyUI-aki/ComfyUI-aki-v1.3/models/clip/clip_l.safetensors"
t5xxl = "A:/funny/ConfyUI-aki/ComfyUI-aki-v1.3/models/clip/t5xxl_fp16.safetensors"
clip_g = "A:/funny/ConfyUI-aki/ComfyUI-aki-v1.3/models/clip/clip_g.safetensors"
timestep_sampling = "shift"
sigmoid_scale = 1
model_prediction_type = "raw"
discrete_flow_shift = 3.158
loss_type = "smooth_l1"
guidance_scale = 1
train_data_dir = "A:/funny/flux_train/Lora训练-测试版/lora-scripts-v1.9.0/train/model_v5"
prior_loss_weight = 1
resolution = "1024,1024"
enable_bucket = true
min_bucket_reso = 256
max_bucket_reso = 2048
bucket_reso_steps = 64
bucket_no_upscale = true
output_name = "model_v5"
output_dir = "./output"
save_model_as = "safetensors"
save_precision = "bf16"
save_every_n_epochs = 2
max_train_epochs = 30
train_batch_size = 1
gradient_checkpointing = true
gradient_accumulation_steps = 1
network_train_unet_only = true
network_train_text_encoder_only = false
learning_rate = 0.0001
unet_lr = 1
text_encoder_lr = 1
lr_scheduler = "cosine"
lr_warmup_steps = 0
lr_scheduler_num_cycles = 0
optimizer_type = "Prodigy"
optimizer_args = [
"decouple=True",
"weight_decay=0.01",
"use_bias_correction=True",
"d_coef=1"
]
network_module = "networks.lora_flux"
network_dim = 8
network_alpha = 8
log_with = "tensorboard"
logging_dir = "./logs"
caption_extension = ".txt"
shuffle_caption = false
weighted_captions = false
keep_tokens = 0
seed = 21337
clip_skip = 1
mixed_precision = "bf16"
fp8_base = true
sdpa = true
lowram = false
cache_latents = true
cache_latents_to_disk = true
cache_text_encoder_outputs = true
cache_text_encoder_outputs_to_disk = true
persistent_data_loader_workers = true

Steps to Reproduce:

  1. Set up the training environment using the parameters specified above.
  2. Begin training .
  3. The error occurs during the training process.

Investigation and Findings:

  • The error indicates that huber_c is None, and the code attempts to call huber_c.view(-1, 1, 1, 1), which leads to an AttributeError.
  • In scripts/dev/train_network.py at line 1187, the get_noise_pred_and_target function from scripts/dev/flux_train_network.py is called.
  • The get_noise_pred_and_target function returns huber_c as None.
  • At line 1191 in scripts/dev/train_network.py, the conditional_loss function from scripts/dev/library/train_util.py is called, which attempts to execute huber_c.view(-1, 1, 1, 1).
  • Since huber_c is None, this results in the AttributeError.

Additional Information:

  • I observed that this issue seems to occur when training flux models using the huber or smooth_l1 loss types.
  • Other users in the community have not reported this issue, suggesting it might be specific to my environment or configuration.
  • I have tried reinstalling the latest version of lora-scripts (v1.10.0), but the problem persists.
  • I'm not deeply familiar with Python, so I'm uncertain whether the code is intended to call get_noise_pred_and_target from scripts/dev/train_network.py or if conditional_loss should be using a different module (e.g., sd-scripts/library/train_util.py or scripts/stable/library/train_util.py).
  • It appears that there might be an incorrect reference or import in the code that's causing the wrong function to be called.

Environment:

  • Operating System: Windows 10 (64-bit)
  • Python Version: 3.12.2
  • lora-scripts Version: v1.10.0
  • Relevant Hardware: NVIDIA RTX 4090
  • Dependencies: Up-to-date versions as per the lora-scripts installation instructions

Expected Behavior:

  • The training process should proceed without errors when using loss_type = "smooth_l1" with flux models.
  • huber_c should be properly initialized and not be None when required.

Actual Behavior:

  • The training process fails with an AttributeError because huber_c is None and does not have a view method.

Request:

  • Could you please help identify the cause of this issue?
  • Is there a possible fix or workaround that I can apply?
  • Do I need to adjust my training parameters or update any code references?

Thank you for your assistance!

If you need any additional information or logs, please let me know, and I'll be happy to provide them.

@Akegarasu Akegarasu added the invalid This doesn't seem right label Nov 5, 2024
@bobo3313
Copy link

me too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

3 participants