Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates rosetta CI to build multiarch (Arm64/amd64) for pax #284

Merged
merged 8 commits into from
Oct 10, 2023

Conversation

terrykong
Copy link
Contributor

@terrykong terrykong commented Oct 5, 2023

Example images from run:

  • pax: ghcr.io/nvidia/jax-toolbox-internal:6414016731-pax-multiarch

T5x will not be be able to be built multiarch until #252 is merged, so for now platforms is configurable and t5x is only built for x86

@terrykong terrykong changed the title Updates rosetta to build multiarch (Arm64/amd64) for both t5x and pax Updates rosetta CI to build multiarch (Arm64/amd64) for both t5x and pax Oct 5, 2023
@terrykong terrykong changed the title Updates rosetta CI to build multiarch (Arm64/amd64) for both t5x and pax Updates rosetta CI to build multiarch (Arm64/amd64) for pax Oct 6, 2023
@terrykong terrykong requested a review from yhtang October 6, 2023 03:01

- name: Pull and validate BASE_IMAGE=${{ env.BASE_IMAGE }} contains BASE_LIBRARY=${{ inputs.BASE_LIBRARY }}
- name: Pull and validate BASE_IMAGE=${{ steps.defaults.outputs.BASE_IMAGE }} contains BASE_LIBRARY=${{ inputs.BASE_LIBRARY }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanity mainly. I was thinking in the future with OCI annotations, we could just inspect the environment vars: https://github.com/NVIDIA/JAX-Toolbox/blob/main/.github/workflows/scripts/get_remote_env.sh and then avoid us having to pull the image; but didn't implement that yet.

Copy link
Contributor Author

@terrykong terrykong Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I failed to mention why it's required; since (rosetta) pax does not re-install paxml, having paxml already installed in editable is required for the rosetta build at the moment. Once we have reproducible builds addressed, we can probably toss this step out b/c we'll just reinstall it after the distribution has been formed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the need for the check is transient and will be eliminated soon with pip-compile, I'd recommend that we simply don't do it. The build process will also fail if the library is not present, right? It should be fine as long as the wrong library does not cause a silent failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@yhtang
Copy link
Collaborator

yhtang commented Oct 9, 2023

The tests failed, likely due to external factors. Rerunning to confirm,.

@yhtang
Copy link
Collaborator

yhtang commented Oct 9, 2023

@terrykong could you please confirm if the build and tests are now successful?

@terrykong
Copy link
Contributor Author

terrykong commented Oct 9, 2023

edit: The pax build is okay, the failure is in E2E time which can fail from time-to-time. T5x is failing due to a build error, which I have a fix for

@terrykong
Copy link
Contributor Author

terrykong commented Oct 9, 2023

@terrykong
Copy link
Contributor Author

Not sure why this wasn't caught earlier, but rosetta-pax fails b/c of extra installs intended only for t5x based images. Am removing those sections for now and rerunning pax build: https://github.com/NVIDIA/JAX-Toolbox/actions/runs/6464588951

@yhtang yhtang merged commit 5c7351e into main Oct 10, 2023
75 of 77 checks passed
@yhtang yhtang deleted the multiarch-rosetta branch October 10, 2023 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants