Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Release Base Images to #1607

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 12 additions & 6 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -77,10 +77,16 @@ jobs:
TAG_NAME=$(echo "${BRANCH_NAME}" | sed 's/\//_/g')
echo "BRANCH_NAME=${BRANCH_NAME}" >> $GITHUB_ENV

echo "DOCKER_TAG=mosaicml/llm-foundry:release_${TAG_NAME}" >> $GITHUB_ENV
echo "AWS_DOCKER_TAG=mosaicml/llm-foundry:release_${TAG_NAME}_aws" >> $GITHUB_ENV
echo "LATEST_TAG=mosaicml/llm-foundry:release-latest" >> $GITHUB_ENV
echo "AWS_LATEST_TAG=mosaicml/llm-foundry:release_aws-latest" >> $GITHUB_ENV
# Don't trigger formal release if manual trigger
if [[ "${GITHUB_EVENT_NAME}" == "push" && "${GITHUB_REF}" =~ ^refs/tags/v ]]; then
echo "DOCKER_TAG=mosaicml/llm-foundry:release_${TAG_NAME}" >> $GITHUB_ENV
echo "AWS_DOCKER_TAG=mosaicml/llm-foundry:release_${TAG_NAME}_aws" >> $GITHUB_ENV
echo "LATEST_TAG=mosaicml/llm-foundry:release-latest" >> $GITHUB_ENV
echo "AWS_LATEST_TAG=mosaicml/llm-foundry:release_aws-latest" >> $GITHUB_ENV
else
echo "DOCKER_TAG=mosaicml/llm-foundry:test_release_${TAG_NAME}" >> $GITHUB_ENV
echo "AWS_DOCKER_TAG=mosaicml/llm-foundry:test_release_${TAG_NAME}_aws" >> $GITHUB_ENV
fi


- name: Build and push AWS Docker image
Expand All @@ -93,7 +99,7 @@ jobs:
${{ env.AWS_DOCKER_TAG }}
${{ env.AWS_LATEST_TAG }}
build-args: |
BASE_IMAGE=mosaicml/pytorch:2.4.0_cu124-python3.11-ubuntu20.04-aws
BASE_IMAGE=mosaicml/llm-foundry:2.4.0_cu124-latest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is base image the same image ur building...?

Copy link
Contributor Author

@KuuCi KuuCi Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're different images, the images built here are
mosaicml/llm-foundry:release_${TAG_NAME}" and mosaicml/llm-foundry:release_latest"
to keep up with foundry releases

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait yeah, how are we building off the foundry images? that should be the result of this right

Copy link
Contributor Author

@KuuCi KuuCi Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh. Huh, I guess the behavior is slightly weird.

  1. https://github.com/mosaicml/llm-foundry/blob/main/.github/workflows/docker.yaml is what builds mosaicml/llm-foundry:2.4.0_cu124-latest (which is largely cached) -- not necessarily the same pr (could be the previous pr)
  2. https://github.com/mosaicml/llm-foundry/blob/main/.github/workflows/release.yaml builds from that image. We don't cache here because it's almost guarenteed cache miss when we clone from our specific branch + pip install, but that doesn't matter bc the dependencies are mostly installed from the mosaicml/llm-foundry:2.4.0_cu124-latest image, and any that aren't install will be in dockerfile so we get net speed up
  3. We keep the foundry repo

but like isn't that fine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But currently the release workflow builds on the pytorch image, not the foundry image...i guess the foundry image does exist though...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think its worth optimizing release workflow to add this complication. This feels a bit weird to me to build on top of the foundry build image...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ agree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can have some unintended complications -- for example when doing foundry release, suppose I create the release branch but don't publish the release and kick off the release workflow yet. Then, in between, someone else merges some breaking PR. Then I launch this new release workflow which builds on the cached latest foundry image that contains the bad commit....right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds fair, will close pr

BRANCH_NAME=${{ env.BRANCH_NAME }}
DEP_GROUPS=[all]
KEEP_FOUNDRY=true
Expand All @@ -108,7 +114,7 @@ jobs:
${{ env.DOCKER_TAG }}
${{ env.LATEST_TAG }}
build-args: |
BASE_IMAGE=mosaicml/pytorch:2.4.0_cu124-python3.11-ubuntu20.04
BASE_IMAGE=mosaicml/llm-foundry:2.4.0_cu124-latest
KuuCi marked this conversation as resolved.
Show resolved Hide resolved
BRANCH_NAME=${{ env.BRANCH_NAME }}
DEP_GROUPS=[all]
KEEP_FOUNDRY=true
Loading