-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates rosetta CI to build multiarch (Arm64/amd64) for pax #284
Conversation
|
||
- name: Pull and validate BASE_IMAGE=${{ env.BASE_IMAGE }} contains BASE_LIBRARY=${{ inputs.BASE_LIBRARY }} | ||
- name: Pull and validate BASE_IMAGE=${{ steps.defaults.outputs.BASE_IMAGE }} contains BASE_LIBRARY=${{ inputs.BASE_LIBRARY }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sanity mainly. I was thinking in the future with OCI annotations, we could just inspect the environment vars: https://github.com/NVIDIA/JAX-Toolbox/blob/main/.github/workflows/scripts/get_remote_env.sh and then avoid us having to pull the image; but didn't implement that yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I failed to mention why it's required; since (rosetta) pax does not re-install paxml, having paxml already installed in editable is required for the rosetta build at the moment. Once we have reproducible builds addressed, we can probably toss this step out b/c we'll just reinstall it after the distribution has been formed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the need for the check is transient and will be eliminated soon with pip-compile, I'd recommend that we simply don't do it. The build process will also fail if the library is not present, right? It should be fine as long as the wrong library does not cause a silent failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
The tests failed, likely due to external factors. Rerunning to confirm,. |
@terrykong could you please confirm if the build and tests are now successful? |
edit: The pax build is okay, the failure is in E2E time which can fail from time-to-time. T5x is failing due to a build error, which I have a fix for |
|
intended for t5x based images
Not sure why this wasn't caught earlier, but rosetta-pax fails b/c of extra installs intended only for t5x based images. Am removing those sections for now and rerunning pax build: https://github.com/NVIDIA/JAX-Toolbox/actions/runs/6464588951 |
Example images from run:
ghcr.io/nvidia/jax-toolbox-internal:6414016731-pax-multiarch
T5x will not be be able to be built multiarch until #252 is merged, so for now platforms is configurable and t5x is only built for x86