-
Notifications
You must be signed in to change notification settings - Fork 4
Issues: ROCm/TransformerEngine
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Out of Box Experience]: ROCm Transformer Engine Should Be Included in AMD Pytorch Images
#82
opened Oct 18, 2024 by
OrenLeung
[FSDP 8xMI300X] Llama3 8B FP8 is 21% slower than BF16 & OOMs on the same batch size
#79
opened Oct 15, 2024 by
OrenLeung
[DDP 8xMI300X] GPT2-1.5B FP8 is 25% slower than BF16 & OOMs on the same batch size
#76
opened Oct 15, 2024 by
OrenLeung
[1xMI300X] GPT-2 XL 1.5B FP8 Training ~30% slower than H100 FP8
#72
opened Oct 13, 2024 by
OrenLeung
[TE] Investigate parallelism implementation in Transformer Engine
#34
opened Apr 23, 2024 by
wangye805
4 of 5 tasks
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.