Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use Blocks to swap error #3015

Open
Nomination-NRB opened this issue Dec 13, 2024 · 2 comments
Open

use Blocks to swap error #3015

Nomination-NRB opened this issue Dec 13, 2024 · 2 comments

Comments

@Nomination-NRB
Copy link

Nomination-NRB commented Dec 13, 2024

I want to do the full checkpoint training for FLUX,
using flux-dev-de-distill as the base model, clip_l.safetensors and t5xxl_fp8_e4m3fn.safetensors as vae,
using single 3090

(I try to use 2 3090, but always fail even I set the correct parameters in the accelerate launch tab)

image
When I use Blocks to swap, it show errors:

19:18:58-709192 INFO     Start training Dreambooth...                                                                                                                                
19:18:58-713359 INFO     Validating lr scheduler arguments...                                                                                                                        
19:18:58-716119 INFO     Validating optimizer arguments...                                                                                                                           
19:18:58-723061 INFO     Validating /home/huishi/workspace/data/repository/kohya_ss/dataset/logs existence and writability... SUCCESS                                                
19:18:58-725027 INFO     Validating /home/huishi/workspace/data/repository/kohya_ss/dataset/outputs existence and writability... SUCCESS                                             
19:18:58-728468 INFO     Validating /home/huishi/workspace/data/repository/Common_Models/flux-dev-de-distill/consolidated_s6700.safetensors existence... SUCCESS                     
19:18:58-731598 INFO     Validating /home/huishi/workspace/data/repository/kohya_ss/dataset/images/animate existence... SUCCESS                                                      
19:18:58-736107 INFO     Validating /home/huishi/workspace/data/repository/ComfyUI/models/vae/ae.safetensors existence... SUCCESS                                                    
19:18:58-737613 INFO     Headless mode, skipping verification if model already exist... if model already exist it will be overwritten...                                             
19:18:58-742114 INFO     Folder 10_kon: 10 repeats found                                                                                                                             
19:18:58-745028 INFO     Folder 10_kon: 8 images found                                                                                                                               
19:18:58-746346 INFO     Folder 10_kon: 8 * 10 = 80 steps                                                                                                                            
19:18:58-747704 INFO     Regularization factor: 1                                                                                                                                    
19:18:58-748963 INFO     Total steps: 80                                                                                                                                             
19:18:58-750163 INFO     Train batch size: 1                                                                                                                                         
19:18:58-751377 INFO     Gradient accumulation steps: 1                                                                                                                              
19:18:58-752730 INFO     Epoch: 1                                                                                                                                                    
19:18:58-753782 INFO     Max train steps: 5000                                                                                                                                       
19:18:58-754771 INFO     lr_warmup_steps = 0.05                                                                                                                                      
19:18:58-763055 INFO     Saving training config to /home/huishi/workspace/data/repository/kohya_ss/dataset/outputs/kon_20241213-191858.json...                                       
19:18:58-768455 INFO     Executing command: /home/ps/anaconda3/envs/koyha/bin/accelerate launch --dynamo_backend no --dynamo_mode default --mixed_precision bf16 --num_processes 1   
                         --num_machines 1 --num_cpu_threads_per_process 4 /home/huishi/workspace/data/repository/kohya_ss/sd-scripts/flux_train.py --config_file                     
                         /home/huishi/workspace/data/repository/kohya_ss/dataset/outputs/config_dreambooth-20241213-191858.toml                                                      
2024-12-13 19:19:06.195395: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-13 19:19:06.195433: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-13 19:19:06.197207: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-13 19:19:06.205571: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-12-13 19:19:07.312627: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-12-13 19:19:08 WARNING  WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:                                                      _cpp_lib.py:144
                                 PyTorch 2.3.0+cu118 with CUDA 1108 (you have 2.4.0+cu118)                                                                                           
                                 Python  3.10.14 (you have 3.10.0)                                                                                                                   
                               Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)                                                      
                               Memory-efficient attention, SwiGLU, sparse and more won't be available.                                                                               
                               Set XFORMERS_MORE_DETAILS=1 for more details                                                                                                          
/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:162: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_fwd")
/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:247: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_bwd")
/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/xformers/triton/softmax.py:30: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd(cast_inputs=torch.float16 if _triton_softmax_fp16_enabled else None)
/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/xformers/triton/softmax.py:87: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(
/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/xformers/ops/swiglu_op.py:107: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(cls, ctx, x, w1, b1, w2, b2, w3, b3):
/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/xformers/ops/swiglu_op.py:128: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(cls, ctx, dx5):
2024-12-13 19:19:09 INFO     Loading settings from /home/huishi/workspace/data/repository/kohya_ss/dataset/outputs/config_dreambooth-20241213-191858.toml...       train_util.py:4528
                    INFO     /home/huishi/workspace/data/repository/kohya_ss/dataset/outputs/config_dreambooth-20241213-191858                                     train_util.py:4547
                    WARNING  cache_latents_to_disk is enabled, so cache_latents is also enabled / cache_latents_to_diskが有効なためcache_latentsを有効にします   train_util.py:4216
2024-12-13 19:19:09 WARNING  cache_text_encoder_outputs_to_disk is enabled, so cache_text_encoder_outputs is also enabled /                                          flux_train.py:70
                             cache_text_encoder_outputs_to_diskが有効になっているためcache_text_encoder_outputsも有効になります                                                    
                    INFO     Using DreamBooth method.                                                                                                               flux_train.py:115
                    INFO     prepare images.                                                                                                                       train_util.py:1971
                    INFO     get image size from name of cache files                                                                                               train_util.py:1886
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 29932.59it/s]
                    INFO     set image size from cache files: 8/8                                                                                                  train_util.py:1916
                    INFO     found directory /home/huishi/workspace/data/repository/kohya_ss/dataset/images/animate/10_kon contains 8 image files                  train_util.py:1918
read caption: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 798.15it/s]
                    INFO     80 train images with repeating.                                                                                                       train_util.py:2012
                    INFO     0 reg images.                                                                                                                         train_util.py:2015
                    WARNING  no regularization images / 正則化画像が見つかりませんでした                                                                           train_util.py:2020
                    INFO     [Dataset 0]                                                                                                                           config_util.py:567
                               batch_size: 1                                                                                                                                         
                               resolution: (512, 512)                                                                                                                                
                               enable_bucket: True                                                                                                                                   
                               network_multiplier: 1.0                                                                                                                               
                               min_bucket_reso: 256                                                                                                                                  
                               max_bucket_reso: 2048                                                                                                                                 
                               bucket_reso_steps: 64                                                                                                                                 
                               bucket_no_upscale: True                                                                                                                               
                                                                                                                                                                                     
                               [Subset 0 of Dataset 0]                                                                                                                               
                                 image_dir: "/home/huishi/workspace/data/repository/kohya_ss/dataset/images/animate/10_kon"                                                          
                                 image_count: 8                                                                                                                                      
                                 num_repeats: 10                                                                                                                                     
                                 shuffle_caption: False                                                                                                                              
                                 keep_tokens: 0                                                                                                                                      
                                 keep_tokens_separator:                                                                                                                              
                                 caption_separator: ,                                                                                                                                
                                 secondary_separator: None                                                                                                                           
                                 enable_wildcard: False                                                                                                                              
                                 caption_dropout_rate: 0                                                                                                                             
                                 caption_dropout_every_n_epochs: 0                                                                                                                   
                                 caption_tag_dropout_rate: 0.0                                                                                                                       
                                 caption_prefix: None                                                                                                                                
                                 caption_suffix: None                                                                                                                                
                                 color_aug: False                                                                                                                                    
                                 flip_aug: False                                                                                                                                     
                                 face_crop_aug_range: None                                                                                                                           
                                 random_crop: False                                                                                                                                  
                                 token_warmup_min: 1                                                                                                                                 
                                 token_warmup_step: 0                                                                                                                                
                                 alpha_mask: False                                                                                                                                   
                                 custom_attributes: {}                                                                                                                               
                                 is_reg: False                                                                                                                                       
                                 class_tokens: kon                                                                                                                                   
                                 caption_extension: .txt                                                                                                                             
                                                                                                                                                                                     
                                                                                                                                                                                     
                    INFO     [Dataset 0]                                                                                                                           config_util.py:573
                    INFO     loading image sizes.                                                                                                                   train_util.py:923
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 272800.26it/s]
                    INFO     make buckets                                                                                                                           train_util.py:946
                    WARNING  min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size              train_util.py:963
                             automatically /                                                                                                                                         
                             bucket_no_upscaleが指定された場合はbucketの解像度は画像サイズから自動計算されるためmin_bucket_resoとmax_bucket_resoは無視されます                   
                    INFO     number of images (including repeats) / 各bucketの画像枚数繰り返し回数を含むtrain_util.py:992
                    INFO     bucket 0: resolution (512, 512), count: 80                                                                                             train_util.py:997
                    INFO     mean ar error (without repeats): 0.0                                                                                                  train_util.py:1002
                    INFO     Checking the state dict: Diffusers or BFL, dev or schnell                                                                               flux_utils.py:43
                    INFO     prepare accelerator                                                                                                                    flux_train.py:185
accelerator device: cuda:0
                    INFO     Building AutoEncoder                                                                                                                   flux_utils.py:144
                    INFO     Loading state dict from /home/huishi/workspace/data/repository/ComfyUI/models/vae/ae.safetensors                                       flux_utils.py:149
                    INFO     Loaded AE: <All keys matched successfully>                                                                                             flux_utils.py:152
2024-12-13 19:19:10 INFO     [Dataset 0]                                                                                                                           train_util.py:2495
                    INFO     caching latents with caching strategy.                                                                                                train_util.py:1048
                    INFO     caching latents...                                                                                                                    train_util.py:1097
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 354.35it/s]
                    INFO     load tokenizer from cache: /home/huishi/workspace/data/repository/ComfyUI/models/clip/openai_clip-vit-large-patch14                  strategy_base.py:62
/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
                    INFO     load tokenizer from cache: /home/huishi/workspace/data/repository/ComfyUI/models/clip/google_t5-v1_1-xxl                             strategy_base.py:62
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
2024-12-13 19:19:11 INFO     Building CLIP-L                                                                                                                        flux_utils.py:179
                    INFO     Loading state dict from /home/huishi/workspace/data/repository/ComfyUI/models/clip/clip_l.safetensors                                  flux_utils.py:275
                    INFO     Loaded CLIP-L: <All keys matched successfully>                                                                                         flux_utils.py:278
                    INFO     Loading state dict from /home/huishi/workspace/data/repository/ComfyUI/models/clip/t5xxl_fp8_e4m3fn.safetensors                        flux_utils.py:330
2024-12-13 19:19:19 INFO     Loaded T5xxl: <All keys matched successfully>                                                                                          flux_utils.py:333
2024-12-13 19:19:23 INFO     [Dataset 0]                                                                                                                           train_util.py:2517
                    INFO     caching Text Encoder outputs with caching strategy.                                                                                   train_util.py:1231
                    INFO     checking cache validity...                                                                                                            train_util.py:1242
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 95.13it/s]
                    INFO     no Text Encoder outputs to cache                                                                                                      train_util.py:1269
                    INFO     cache Text Encoder outputs for sample prompt: /home/huishi/workspace/data/repository/kohya_ss/dataset/outputs/sample/prompt.txt        flux_train.py:248
                    INFO     cache Text Encoder outputs for prompt: masterpiece, best quality, 1girl, upper body, looking at viewer, red background                 flux_train.py:258
                    INFO     cache Text Encoder outputs for prompt: low quality, worst quality, bad anatomy, bad composition, poor, low effort                      flux_train.py:258
2024-12-13 19:19:24 INFO     Checking the state dict: Diffusers or BFL, dev or schnell                                                                               flux_utils.py:43
                    INFO     Building Flux model schnell from BFL checkpoint                                                                                        flux_utils.py:101
                    INFO     Loading state dict from /home/huishi/workspace/data/repository/Common_Models/flux-dev-de-distill/consolidated_s6700.safetensors        flux_utils.py:118
                    INFO     Loaded Flux: <All keys matched successfully>                                                                                           flux_utils.py:137
FLUX: Gradient checkpointing enabled. CPU offload: False
                    INFO     enable block swap: blocks_to_swap=34                                                                                                   flux_train.py:303
FLUX: Block swap enabled. Swapping 34 blocks, double blocks: 17, single blocks: 34.
number of trainable parameters: 11891178560
prepare optimizer, data loader etc.
                    INFO     use Adafactor optimizer | {'relative_step': False, 'scale_parameter': False, 'warmup_init': False}                                    train_util.py:4841
                    WARNING  because max_grad_norm is set, clip_grad_norm is enabled. consider set to 0 /                                                          train_util.py:4869
                             max_grad_normが設定されているためclip_grad_normが有効になります0に設定して無効にしたほうがいいかもしれません                                          
enable full bf16 training.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/huishi/workspace/data/repository/kohya_ss/sd-scripts/flux_train.py", line 849, in <module>
[rank0]:     train(args)
[rank0]:   File "/home/huishi/workspace/data/repository/kohya_ss/sd-scripts/flux_train.py", line 461, in train
[rank0]:     flux = accelerator.prepare(flux, device_placement=[not is_swapping_blocks])
[rank0]:   File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/accelerate/accelerator.py", line 1311, in prepare
[rank0]:     result = tuple(
[rank0]:   File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/accelerate/accelerator.py", line 1312, in <genexpr>
[rank0]:     self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
[rank0]:   File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/accelerate/accelerator.py", line 1188, in _prepare_one
[rank0]:     return self.prepare_model(obj, device_placement=device_placement)
[rank0]:   File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/accelerate/accelerator.py", line 1452, in prepare_model
[rank0]:     model = torch.nn.parallel.DistributedDataParallel(
[rank0]:   File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 739, in __init__
[rank0]:     self._log_and_throw(
[rank0]:   File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1127, in _log_and_throw
[rank0]:     raise err_type(err_msg)
[rank0]: ValueError: DistributedDataParallel device_ids and output_device arguments only work with single-device/multiple-device GPU modules or CPU modules, but got device_ids [0], output_device 0, and module parameters {device(type='cpu')}.
E1213 19:19:26.401438 140374856504256 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 1451534) of binary: /home/ps/anaconda3/envs/koyha/bin/python
Traceback (most recent call last):
  File "/home/ps/anaconda3/envs/koyha/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1097, in launch_command
    multi_gpu_launcher(args)
  File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/accelerate/commands/launch.py", line 734, in multi_gpu_launcher
    distrib_run.run(args)
  File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run
    elastic_launch(
  File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/ps/anaconda3/envs/koyha/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/home/huishi/workspace/data/repository/kohya_ss/sd-scripts/flux_train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-12-13_19:19:26
  host      : ps-Super-Server
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1451534)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
19:19:28-043803 INFO     Training has ended.     
@Sylsatra
Copy link

There are a few problems with your config:
First: Are you trying to do multi-GPU training? If yes, check your GPU ID in the Performance Tab in the Task Manager. If not, disable multi-GPU training.

Second: Your cuda and xformer versions are throwing warnings, it is best to reinstall cuda and xformer. (Or use SPDA)
"Please update PyTorch to 2.4.0. We have tested with torch==2.4.0 and torchvision==0.19.0 with CUDA 12.4. We also updated accelerate to 0.33.0 just to be safe. requirements.txt is also updated, so please update the requirements.
The command to install PyTorch is as follows: pip3 install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124"

@Nomination-NRB
Copy link
Author

There are a few problems with your config: First: Are you trying to do multi-GPU training? If yes, check your GPU ID in the Performance Tab in the Task Manager. If not, disable multi-GPU training.

Second: Your cuda and xformer versions are throwing warnings, it is best to reinstall cuda and xformer. (Or use SPDA) "Please update PyTorch to 2.4.0. We have tested with torch==2.4.0 and torchvision==0.19.0 with CUDA 12.4. We also updated accelerate to 0.33.0 just to be safe. requirements.txt is also updated, so please update the requirements. The command to install PyTorch is as follows: pip3 install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124"

Thank you for your reply. I tried to use multi-card training before, but there was an error, so I turned off the multi-card training parameters. It should not be the problem, and my cuda can only support cu118, maybe the xformers couldn't match the torch causing the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants