Flux ControlNet Training Multi-GPU DeepSpeed Stage-3 doesn't reduce memory compared to Single GPU #10027
Unanswered
enesmsahin
asked this question in
Q&A
Replies: 1 comment
-
This doesn't seem like a |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Describe the bug
I am running a slightly modified version of Flux ControlNet training script in diffusers. The script is attached below. I am using DeepSpeed Stage-3 with the accelerate config below.
When I use only 1 GPU (configured via accelerate config file below), it takes around 42GB during training. When I use all 8 GPUs in a single node, it still takes around 42GB per GPU.
I don't know about the parallelization details of DeepSpeed but I would expect DeepSpeed Stage-3 to shard the model weights further and reduce the memory usage per GPU for 8 GPUs compared to single-GPU case.
PS: I am not sure if this issue is related to the CN training script in
diffusers
oraccelerate
. I have opened the same issue inaccelerate
.Reproduction
Link to the script: https://pastebin.com/SdQZcQR8
Command used to run the script:
Accelerate Config File
Logs
No response
System Info
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
NVIDIA A100-SXM4-80GB, 81920 MiB
Who can help?
@PromeAIpro @sayakpaul
Beta Was this translation helpful? Give feedback.
All reactions