use Blocks to swap error #3015

Nomination-NRB · 2024-12-13T11:26:45Z

I want to do the full checkpoint training for FLUX,
using flux-dev-de-distill as the base model, clip_l.safetensors and t5xxl_fp8_e4m3fn.safetensors as vae,
using single 3090

(I try to use 2 3090, but always fail even I set the correct parameters in the accelerate launch tab)

When I use Blocks to swap, it show errors:

19:18:58-709192 INFO     Start training Dreambooth...                                                                                                                                
19:18:58-713359 INFO     Validating lr scheduler arguments...                                                                                                                        
19:18:58-716119 INFO     Validating optimizer arguments...                                                                                                                           
19:18:58-723061 INFO     Validating /home/huishi/workspace/data/repository/kohya_ss/dataset/logs existence and writability... SUCCESS                                                
19:18:58-725027 INFO     Validating /home/huishi/workspace/data/repository/kohya_ss/dataset/outputs existence and writability... SUCCESS                                             
19:18:58-728468 INFO     Validating /home/huishi/workspace/data/repository/Common_Models/flux-dev-de-distill/consolidated_s6700.safetensors existence... SUCCESS                     
19:18:58-731598 INFO     Validating /home/huishi/workspace/data/repository/kohya_ss/dataset/images/animate existence... SUCCESS                                                      
19:18:58-736107 INFO     Validating /home/huishi/workspace/data/repository/ComfyUI/models/vae/ae.safetensors existence... SUCCESS                                                    
19:18:58-737613 INFO     Headless mode, skipping verification if model already exist... if model already exist it will be overwritten...                                             
19:18:58-742114 INFO     Folder 10_kon: 10 repeats found                                                                                                                             
19:18:58-745028 INFO     Folder 10_kon: 8 images found                                                                                                                               
19:18:58-746346 INFO     Folder 10_kon: 8 * 10 = 80 steps                                                                                                                            
19:18:58-747704 INFO     Regularization factor: 1                                                                                                                                    
19:18:58-748963 INFO     Total steps: 80                                                                                                                                             
19:18:58-750163 INFO     Train batch size: 1                                                                                                                                         
19:18:58-751377 INFO     Gradient accumulation steps: 1                                                                                                                              
19:18:58-752730 INFO     Epoch: 1                                                                                                                                                    
19:18:58-753782 INFO     Max train steps: 5000                                                                                                                                       
19:18:58-754771 INFO     lr_warmup_steps = 0.05                                                                                                                                      
19:18:58-763055 INFO     Saving training config to /home/huishi/workspace/data/repository/kohya_ss/dataset/outputs/kon_20241213-191858.json...                                       
19:18:58-768455 INFO     Executing command: /home/ps/anaconda3/envs/koyha/bin/accelerate launch --dynamo_backend no --dynamo_mode default --mixed_precision bf16 --num_processes 1   
                         --num_machines 1 --num_cpu_threads_per_process 4 /home/huishi/workspace/data/repository/kohya_ss/sd-scripts/flux_train.py --config_file                     
                         /home/huishi/workspace/data/repository/kohya_ss/dataset/outputs/config_dreambooth-20241213-191858.toml                                                      
2024-12-13 19:19:06.195395: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-13 19:19:06.195433: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-13 19:19:06.197207: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-13 19:19:06.205571: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-12-13 19:19:07.312627: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-12-13 19:19:08 WARNING  WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:                                                      _cpp_lib.py:144
                                 PyTorch 2.3.0+cu118 with CUDA 1108 (you have 2.4.0+cu118)                                                                                           
                                 Python  3.10.14 (you have 3.10.0)                                                                                                                   
                               Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)                                                      
                               Memory-efficient attention, SwiGLU, sparse and more won't be available.                                                                               
                               Set XFORMERS_MORE_DETAILS=1 for more details                                                                                                          
/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:162: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_fwd")
/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:247: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_bwd")
/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/xformers/triton/softmax.py:30: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd(cast_inputs=torch.float16 if _triton_softmax_fp16_enabled else None)
/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/xformers/triton/softmax.py:87: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(
/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/xformers/ops/swiglu_op.py:107: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(cls, ctx, x, w1, b1, w2, b2, w3, b3):
/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/xformers/ops/swiglu_op.py:128: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(cls, ctx, dx5):
2024-12-13 19:19:09 INFO     Loading settings from /home/huishi/workspace/data/repository/kohya_ss/dataset/outputs/config_dreambooth-20241213-191858.toml...       train_util.py:4528
                    INFO     /home/huishi/workspace/data/repository/kohya_ss/dataset/outputs/config_dreambooth-20241213-191858                                     train_util.py:4547
                    WARNING  cache_latents_to_disk is enabled, so cache_latents is also enabled / cache_latents_to_diskが有効なため、cache_latentsを有効にします   train_util.py:4216
2024-12-13 19:19:09 WARNING  cache_text_encoder_outputs_to_disk is enabled, so cache_text_encoder_outputs is also enabled /                                          flux_train.py:70
                             cache_text_encoder_outputs_to_diskが有効になっているため、cache_text_encoder_outputsも有効になります                                                    
                    INFO     Using DreamBooth method.                                                                                                               flux_train.py:115
                    INFO     prepare images.                                                                                                                       train_util.py:1971
                    INFO     get image size from name of cache files                                                                                               train_util.py:1886
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 29932.59it/s]
                    INFO     set image size from cache files: 8/8                                                                                                  train_util.py:1916
                    INFO     found directory /home/huishi/workspace/data/repository/kohya_ss/dataset/images/animate/10_kon contains 8 image files                  train_util.py:1918
read caption: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 798.15it/s]
                    INFO     80 train images with repeating.                                                                                                       train_util.py:2012
                    INFO     0 reg images.                                                                                                                         train_util.py:2015
                    WARNING  no regularization images / 正則化画像が見つかりませんでした                                                                           train_util.py:2020
                    INFO     [Dataset 0]                                                                                                                           config_util.py:567
                               batch_size: 1                                                                                                                                         
                               resolution: (512, 512)                                                                                                                                
                               enable_bucket: True                                                                                                                                   
                               network_multiplier: 1.0                                                                                                                               
                               min_bucket_reso: 256                                                                                                                                  
                               max_bucket_reso: 2048                                                                                                                                 
                               bucket_reso_steps: 64                                                                                                                                 
                               bucket_no_upscale: True                                                                                                                               
                                                                                                                                                                                     
                               [Subset 0 of Dataset 0]                                                                                                                               
                                 image_dir: "/home/huishi/workspace/data/repository/kohya_ss/dataset/images/animate/10_kon"                                                          
                                 image_count: 8                                                                                                                                      
                                 num_repeats: 10                                                                                                                                     
                                 shuffle_caption: False                                                                                                                              
                                 keep_tokens: 0                                                                                                                                      
                                 keep_tokens_separator:                                                                                                                              
                                 caption_separator: ,                                                                                                                                
                                 secondary_separator: None                                                                                                                           
                                 enable_wildcard: False                                                                                                                              
                                 caption_dropout_rate: 0                                                                                                                             
                                 caption_dropout_every_n_epochs: 0                                                                                                                   
                                 caption_tag_dropout_rate: 0.0                                                                                                                       
                                 caption_prefix: None                                                                                                                                
                                 caption_suffix: None                                                                                                                                
                                 color_aug: False                                                                                                                                    
                                 flip_aug: False                                                                                                                                     
                                 face_crop_aug_range: None                                                                                                                           
                                 random_crop: False                                                                                                                                  
                                 token_warmup_min: 1                                                                                                                                 
                                 token_warmup_step: 0                                                                                                                                
                                 alpha_mask: False                                                                                                                                   
                                 custom_attributes: {}                                                                                                                               
                                 is_reg: False                                                                                                                                       
                                 class_tokens: kon                                                                                                                                   
                                 caption_extension: .txt                                                                                                                             
                                                                                                                                                                                     
                                                                                                                                                                                     
                    INFO     [Dataset 0]                                                                                                                           config_util.py:573
                    INFO     loading image sizes.                                                                                                                   train_util.py:923
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 272800.26it/s]
                    INFO     make buckets                                                                                                                           train_util.py:946
                    WARNING  min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size              train_util.py:963
                             automatically /                                                                                                                                         
                             bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます                   
                    INFO     number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）                                                        train_util.py:992
                    INFO     bucket 0: resolution (512, 512), count: 80                                                                                             train_util.py:997
                    INFO     mean ar error (without repeats): 0.0                                                                                                  train_util.py:1002
                    INFO     Checking the state dict: Diffusers or BFL, dev or schnell                                                                               flux_utils.py:43
                    INFO     prepare accelerator                                                                                                                    flux_train.py:185
accelerator device: cuda:0
                    INFO     Building AutoEncoder                                                                                                                   flux_utils.py:144
                    INFO     Loading state dict from /home/huishi/workspace/data/repository/ComfyUI/models/vae/ae.safetensors                                       flux_utils.py:149
                    INFO     Loaded AE: <All keys matched successfully>                                                                                             flux_utils.py:152
2024-12-13 19:19:10 INFO     [Dataset 0]                                                                                                                           train_util.py:2495
                    INFO     caching latents with caching strategy.                                                                                                train_util.py:1048
                    INFO     caching latents...                                                                                                                    train_util.py:1097
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 354.35it/s]
                    INFO     load tokenizer from cache: /home/huishi/workspace/data/repository/ComfyUI/models/clip/openai_clip-vit-large-patch14                  strategy_base.py:62
/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
                    INFO     load tokenizer from cache: /home/huishi/workspace/data/repository/ComfyUI/models/clip/google_t5-v1_1-xxl                             strategy_base.py:62
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2024-12-13 19:19:11 INFO     Building CLIP-L                                                                                                                        flux_utils.py:179
                    INFO     Loading state dict from /home/huishi/workspace/data/repository/ComfyUI/models/clip/clip_l.safetensors                                  flux_utils.py:275
                    INFO     Loaded CLIP-L: <All keys matched successfully>                                                                                         flux_utils.py:278
                    INFO     Loading state dict from /home/huishi/workspace/data/repository/ComfyUI/models/clip/t5xxl_fp8_e4m3fn.safetensors                        flux_utils.py:330
2024-12-13 19:19:19 INFO     Loaded T5xxl: <All keys matched successfully>                                                                                          flux_utils.py:333
2024-12-13 19:19:23 INFO     [Dataset 0]                                                                                                                           train_util.py:2517
                    INFO     caching Text Encoder outputs with caching strategy.                                                                                   train_util.py:1231
                    INFO     checking cache validity...                                                                                                            train_util.py:1242
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 95.13it/s]
                    INFO     no Text Encoder outputs to cache                                                                                                      train_util.py:1269
                    INFO     cache Text Encoder outputs for sample prompt: /home/huishi/workspace/data/repository/kohya_ss/dataset/outputs/sample/prompt.txt        flux_train.py:248
                    INFO     cache Text Encoder outputs for prompt: masterpiece, best quality, 1girl, upper body, looking at viewer, red background                 flux_train.py:258
                    INFO     cache Text Encoder outputs for prompt: low quality, worst quality, bad anatomy, bad composition, poor, low effort                      flux_train.py:258
2024-12-13 19:19:24 INFO     Checking the state dict: Diffusers or BFL, dev or schnell                                                                               flux_utils.py:43
                    INFO     Building Flux model schnell from BFL checkpoint                                                                                        flux_utils.py:101
                    INFO     Loading state dict from /home/huishi/workspace/data/repository/Common_Models/flux-dev-de-distill/consolidated_s6700.safetensors        flux_utils.py:118
                    INFO     Loaded Flux: <All keys matched successfully>                                                                                           flux_utils.py:137
FLUX: Gradient checkpointing enabled. CPU offload: False
                    INFO     enable block swap: blocks_to_swap=34                                                                                                   flux_train.py:303
FLUX: Block swap enabled. Swapping 34 blocks, double blocks: 17, single blocks: 34.
number of trainable parameters: 11891178560
prepare optimizer, data loader etc.
                    INFO     use Adafactor optimizer | {'relative_step': False, 'scale_parameter': False, 'warmup_init': False}                                    train_util.py:4841
                    WARNING  because max_grad_norm is set, clip_grad_norm is enabled. consider set to 0 /                                                          train_util.py:4869
                             max_grad_normが設定されているためclip_grad_normが有効になります。0に設定して無効にしたほうがいいかもしれません                                          
enable full bf16 training.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/huishi/workspace/data/repository/kohya_ss/sd-scripts/flux_train.py", line 849, in <module>
[rank0]:     train(args)
[rank0]:   File "/home/huishi/workspace/data/repository/kohya_ss/sd-scripts/flux_train.py", line 461, in train
[rank0]:     flux = accelerator.prepare(flux, device_placement=[not is_swapping_blocks])
[rank0]:   File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/accelerate/accelerator.py", line 1311, in prepare
[rank0]:     result = tuple(
[rank0]:   File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/accelerate/accelerator.py", line 1312, in <genexpr>
[rank0]:     self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
[rank0]:   File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/accelerate/accelerator.py", line 1188, in _prepare_one
[rank0]:     return self.prepare_model(obj, device_placement=device_placement)
[rank0]:   File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/accelerate/accelerator.py", line 1452, in prepare_model
[rank0]:     model = torch.nn.parallel.DistributedDataParallel(
[rank0]:   File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 739, in __init__
[rank0]:     self._log_and_throw(
[rank0]:   File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1127, in _log_and_throw
[rank0]:     raise err_type(err_msg)
[rank0]: ValueError: DistributedDataParallel device_ids and output_device arguments only work with single-device/multiple-device GPU modules or CPU modules, but got device_ids [0], output_device 0, and module parameters {device(type='cpu')}.
E1213 19:19:26.401438 140374856504256 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 1451534) of binary: /home/ps/anaconda3/envs/koyha/bin/python
Traceback (most recent call last):
  File "/home/ps/anaconda3/envs/koyha/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1097, in launch_command
    multi_gpu_launcher(args)
  File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/accelerate/commands/launch.py", line 734, in multi_gpu_launcher
    distrib_run.run(args)
  File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run
    elastic_launch(
  File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/home/huishi/workspace/data/repository/kohya_ss/sd-scripts/flux_train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-12-13_19:19:26
  host      : ps-Super-Server
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1451534)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
19:19:28-043803 INFO     Training has ended.

Sylsatra · 2024-12-15T02:29:34Z

There are a few problems with your config:
First: Are you trying to do multi-GPU training? If yes, check your GPU ID in the Performance Tab in the Task Manager. If not, disable multi-GPU training.

Second: Your cuda and xformer versions are throwing warnings, it is best to reinstall cuda and xformer. (Or use SPDA)
"Please update PyTorch to 2.4.0. We have tested with torch==2.4.0 and torchvision==0.19.0 with CUDA 12.4. We also updated accelerate to 0.33.0 just to be safe. requirements.txt is also updated, so please update the requirements.
The command to install PyTorch is as follows: pip3 install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124"

Nomination-NRB · 2024-12-16T03:20:13Z

There are a few problems with your config: First: Are you trying to do multi-GPU training? If yes, check your GPU ID in the Performance Tab in the Task Manager. If not, disable multi-GPU training.

Second: Your cuda and xformer versions are throwing warnings, it is best to reinstall cuda and xformer. (Or use SPDA) "Please update PyTorch to 2.4.0. We have tested with torch==2.4.0 and torchvision==0.19.0 with CUDA 12.4. We also updated accelerate to 0.33.0 just to be safe. requirements.txt is also updated, so please update the requirements. The command to install PyTorch is as follows: pip3 install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124"

Thank you for your reply. I tried to use multi-card training before, but there was an error, so I turned off the multi-card training parameters. It should not be the problem, and my cuda can only support cu118, maybe the xformers couldn't match the torch causing the problem

Nomination-NRB mentioned this issue Dec 14, 2024

Flux Dreambooth/finetune 是否能在4090 24GVRAM上成功训练？ #2972

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use Blocks to swap error #3015

use Blocks to swap error #3015

Nomination-NRB commented Dec 13, 2024 •

edited

Loading

Sylsatra commented Dec 15, 2024

Nomination-NRB commented Dec 16, 2024

use Blocks to swap error #3015

use Blocks to swap error #3015

Comments

Nomination-NRB commented Dec 13, 2024 • edited Loading

Sylsatra commented Dec 15, 2024

Nomination-NRB commented Dec 16, 2024

Nomination-NRB commented Dec 13, 2024 •

edited

Loading