You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Firstly, thank you for the well-detailed article! I am writing to provide some feedback and seek clarification.
Optimizer Selection:
The blog post demonstrates the use of a particular optimizer, "paged_adamw_32bit". However, upon altering this to "adamw_torch", I encountered an Out Of Memory (OOM) issue. Could you elucidate on the critical role the default optimizer plays in the successful execution of the example provided? Any insight into why the memory issue arises with "adamw_torch" would be highly valuable.
GPU Utilization:
In attempting to replicate the described setup on a 2 nodes x 8 H100s machine, I observed a relatively low GPU utilization rate of around 20% with the GPUs drawing only about ~200 Watts. Is there any recommendation on how to elevate the GPU utilization rate, to potentially expedite the training process and maximize the computational resources at hand?
Your guidance will be immensely beneficial!
Thank you for your time.
The text was updated successfully, but these errors were encountered:
Hi @pacman100 ,
Firstly, thank you for the well-detailed article! I am writing to provide some feedback and seek clarification.
Optimizer Selection:
GPU Utilization:
Your guidance will be immensely beneficial!
Thank you for your time.
The text was updated successfully, but these errors were encountered: