This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Flux ControlNet Training Multi-GPU DeepSpeed Stage-3 doesn't reduce memory compared to Single GPU #10026
Labels
bug
Something isn't working
Describe the bug
I am running a slightly modified version of Flux ControlNet training script in diffusers. The script is attached below. I am using DeepSpeed Stage-3 with the accelerate config below.
When I use only 1 GPU (configured via accelerate config file below), it takes around 42GB during training. When I use all 8 GPUs in a single node, it still takes around 42GB per GPU.
I don't know about the parallelization details of DeepSpeed but I would expect DeepSpeed Stage-3 to shard the model weights further and reduce the memory usage per GPU for 8 GPUs compared to single-GPU case.
PS: I am not sure if this issue is related to the CN training script in
diffusers
oraccelerate
. I have opened the same issue inaccelerate
.Reproduction
Link to the script: https://pastebin.com/SdQZcQR8
Command used to run the script:
Accelerate Config File
Logs
No response
System Info
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
Who can help?
@PromeAIpro @sayakpaul
The text was updated successfully, but these errors were encountered: