Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancements for Efficient Utilization and Optimization in Fine-tuning Llama 2 70B Example #7

Open
adamlin120 opened this issue Oct 4, 2023 · 0 comments

Comments

@adamlin120
Copy link

Hi @pacman100 ,

Firstly, thank you for the well-detailed article! I am writing to provide some feedback and seek clarification.

  1. Optimizer Selection:

    • The blog post demonstrates the use of a particular optimizer, "paged_adamw_32bit". However, upon altering this to "adamw_torch", I encountered an Out Of Memory (OOM) issue. Could you elucidate on the critical role the default optimizer plays in the successful execution of the example provided? Any insight into why the memory issue arises with "adamw_torch" would be highly valuable.
  2. GPU Utilization:

    • In attempting to replicate the described setup on a 2 nodes x 8 H100s machine, I observed a relatively low GPU utilization rate of around 20% with the GPUs drawing only about ~200 Watts. Is there any recommendation on how to elevate the GPU utilization rate, to potentially expedite the training process and maximize the computational resources at hand?

Your guidance will be immensely beneficial!

Thank you for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant