-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publishes rosetta CUDA 12.1 and containers for pax and t5x AND containers for nightlies based on jax-pinned images for CUDA 12.1 and 12.2 #286
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this build arm64 images once #284 is merged?
We probably don't want to build arm64+CUDA12.1 images since they do not work on Grace Hopper.
Oops. You are right. I need to rebase this on #284 and only add the amd64 platform. Will do that now |
0c9f07a
to
22aab07
Compare
Done. Launched this run to test: https://github.com/NVIDIA/JAX-Toolbox/actions/runs/6436638266 (edit: build failed, but not due to this change) |
intended for t5x based images
b1a6b09
to
6eaf361
Compare
Re-running CI after rebasing on top of the multiarch PR to make sure nothing broke: https://github.com/NVIDIA/JAX-Toolbox/actions/runs/6465287847 |
Ahh, even that build failed. I forgot I haven't cherry-picked @yhtang 's fix for NCCL in 12.1 images |
Rerunning agaaain after rebasing with the latest NCCL fix. https://github.com/NVIDIA/JAX-Toolbox/actions/runs/6466124323 |
workflows. Will rely on jax-toolbox-internal
Launched sandbox CI:
(edit: re-ran due to t5x build failure; waiting on google-research/t5x#1416, but configured workflow to default to this branch for now) |
Rerun on top-of-tree for JAX/XLA after openxla/xla#6305 is merged. https://github.com/NVIDIA/JAX-Toolbox/actions/runs/6512759676: build succeeded. |
No description provided.