You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)
Reproduction
I am running a slightly modified version of Flux ControlNet training script in diffusers. The script is attached below. I am using DeepSpeed Stage-3 with the accelerate config below.
When I use only 1 GPU (configured via accelerate config file below), it takes around 42GB during training. When I use all 8 GPUs in a single node, it still takes around 42GB per GPU.
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
I am running a slightly modified version of Flux ControlNet training script in diffusers. The script is attached below. I am using DeepSpeed Stage-3 with the accelerate config below.
When I use only 1 GPU (configured via accelerate config file below), it takes around 42GB during training. When I use all 8 GPUs in a single node, it still takes around 42GB per GPU.
Command used to run the script:
Link to the script: https://pastebin.com/SdQZcQR8
Expected behavior
I would expect DeepSpeed Stage-3 to shard the model weights further and reduce the memory usage per GPU for 8 GPUs compared to single-GPU case.
The text was updated successfully, but these errors were encountered: