Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS - Metaphor app fails to pull due to image name mismatch #1801

Closed
1 task done
alechp opened this issue Sep 5, 2023 · 11 comments
Closed
1 task done

AWS - Metaphor app fails to pull due to image name mismatch #1801

alechp opened this issue Sep 5, 2023 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@alechp
Copy link

alechp commented Sep 5, 2023

Which version of kubefirst are you using?

2.2.17

Which cloud provider?

AWS

Which installation type?

CLI

Which distributed Git provider?

GitHub

What is the issue?

Problem

screenshot_2023-09-01_at_4 28 07_pm

  • Metaphor image is successfully published
  • Image fails to pull due to name mismatch
    • Metaphor is checking ghcr.io/<GITHUB_ORG>/metaphor:hash
    • Gitops is publishing to ghcr.io/<GITHUB_ORG>/metaphor/metaphor:hash (note the duplicative metaphor)

image

Similar to the GCP issue here: #1774, @cameronraysmith submitted a fix for AWS here: #1797

I believe @claywd is currently looking into this (screenshots below for context)

Additional context

Screenshot 2023-09-04 at 5 33 03 PM Screenshot 2023-09-04 at 5 33 17 PM Screenshot 2023-09-04 at 5 32 30 PM

Related issues

Code of Conduct

  • I agree to follow this project's Code of Conduct
@alechp
Copy link
Author

alechp commented Sep 5, 2023

  • Fixes that I originally tried (URL change based on @cameronraysmith's suggestion, the application.yaml change based on changes that are being made to Civo), which resulted in a StatefulSet error:
Screenshot 2023-09-04 at 6 51 22 PM More context here: https://github.com//pull/1797#issuecomment-1705810767
  • Tried digging into registry (since metaphor's image pull issue revolves around ghcr) in the gitops-template repo and noticed that it's unavailable:
Screenshot 2023-09-04 at 6 50 51 PM Screenshot 2023-09-04 at 6 50 56 PM

@fharper
Copy link
Contributor

fharper commented Sep 6, 2023

Thanks for reporting this issue @alechp: someone will review @cameronraysmith's PR soon.

@alechp
Copy link
Author

alechp commented Sep 12, 2023

Tried the build again. Getting no such file or directory for registry/development/components/development/metaphor/Chart.yaml:

Screenshot 2023-09-11 at 8 15 04 PM

When I dig into the gitops repo, I notice that the specified --cluster-name passed to kubefirst aws create is missing from the path that's being checked:
Screenshot 2023-09-11 at 8 16 07 PM

Full log

metaphor-development-58bfec9-gttdz-set-environment-version-3227680445: setting wrapper Chart.yaml to version: 0.0.1-rc.58bfec9
metaphor-development-58bfec9-gttdz-set-environment-version-3227680445: sed: can't read registry/development/components/development/metaphor/Chart.yaml: No such file or directory
metaphor-development-58bfec9-gttdz-set-environment-version-3227680445: time="2023-09-12T03:12:17.257Z" level=info msg="sub-process exited" argo=true error="<nil>"
metaphor-development-58bfec9-gttdz-set-environment-version-3227680445: time="2023-09-12T03:12:17.258Z" level=info msg="/src -> /var/run/argo/outputs/artifacts/src.tgz" argo=true
metaphor-development-58bfec9-gttdz-set-environment-version-3227680445: time="2023-09-12T03:12:17.258Z" level=info msg="Taring /src"
metaphor-development-58bfec9-gttdz-set-environment-version-3227680445: time="2023-09-12T03:12:17.381Z" level=info msg="archived 293 files/dirs in /src"
metaphor-development-58bfec9-gttdz-set-environment-version-3227680445: Error: exit status 2
metaphor-development-58bfec9-gttdz Failed at 2023-09-12 03:12:24 +0000 UTC

Context

  • running from source, not 2.2.17
  • This is the last commit on kubefirst as of testing this script:

38c4d4f

@alechp
Copy link
Author

alechp commented Sep 12, 2023

Source of the issue from #1801 (comment) is due to the metaphor/github/main.yaml workflow setting the cluster name to the environment

Screenshot 2023-09-11 at 8 32 16 PM

I fixed this by updating the fullChartPath parameter in metaphor/.argo/deploy.yaml from

registry/{{workflow.parameters.clusterName}}/components/{{workflow.parameters.environment}}/{{workflow.parameters.appName}}/Chart.yaml

to

You can see this matches the generated value:
Screenshot 2023-09-11 at 8 34 50 PM

The actual fix would be to update the -p clusterName in each stage (inside .github/workflows/main.yaml.

This would also address the subsequent failure in metaphor/.argo/release.yaml where it also references workflows.parameters.clusterName:
Screenshot 2023-09-11 at 8 39 38 PM

After doing this, release throws a 409:

metaphor-release-d2c54ff-z8vqv-publish-chart-2625185248: "kubefirst" has been added to your repositories
metaphor-release-d2c54ff-z8vqv-publish-chart-2625185248: Pushing metaphor-0.0.1.tgz to kubefirst...
metaphor-release-d2c54ff-z8vqv-publish-chart-2625185248: Error: 409: chart already exists
Screenshot 2023-09-11 at 8 47 14 PM

@alechp
Copy link
Author

alechp commented Sep 12, 2023

The following manual changes gets Github Actions workflow to pass all the way through release:

  1. fullChartPath value in metaphor/.argo/deploy.yaml and metaphor/.argo/release.yaml (2 files, same update in both):
    From:
              parameters:
                - name: fullChartPath
                  value: 'registry/{{workflow.parameters.clusterName}}/components/{{workflow.parameters.environment}}/{{workflow.parameters.appName}}/Chart.yaml'

To:

              parameters:
                - name: fullChartPath
                  value: 'registry/WHATEVER_VALUE_YOU_PASSED_TO_--cluster-name_IN_CLI/components/{{workflow.parameters.environment}}/{{workflow.parameters.appName}}/Chart.yaml'
  1. For sake of consistency (although this doesn't fix it by itself which is why I include it as second priority) update the clusterName in staging, development and production. Need to update once for each stage:
    From:
      - name: development
        run: |
          echo "commit sha $GITHUB_SHA"
          argo version --short
          argo submit .argo/deploy.yaml \
            --generate-name="${GITHUB_REPOSITORY_NAME_PART}-development-${GITHUB_SHA_SHORT}-" \
            -p appName="${GITHUB_REPOSITORY_NAME_PART}" \
            -p branch="${GITHUB_REF_NAME}" \
            -p clusterName="development" \
            -p environment="development" \
            -p gitUrlNoProtocol="[email protected]:${GITHUB_REPOSITORY_OWNER_PART_SLUG}" \
            -p shortSha="${GITHUB_SHA_SHORT}" \
            --wait --log

To:

      - name: development
        run: |
          echo "commit sha $GITHUB_SHA"
          argo version --short
          argo submit .argo/deploy.yaml \
            --generate-name="${GITHUB_REPOSITORY_NAME_PART}-development-${GITHUB_SHA_SHORT}-" \
            -p appName="${GITHUB_REPOSITORY_NAME_PART}" \
            -p branch="${GITHUB_REF_NAME}" \
            -p clusterName="WHATEVER_VALUE_YOU_PASSED_TO_--cluster-name_IN_CLI" \
            -p environment="development" \
            -p gitUrlNoProtocol="[email protected]:${GITHUB_REPOSITORY_OWNER_PART_SLUG}" \
            -p shortSha="${GITHUB_SHA_SHORT}" \
            --wait --log
  1. Add whitespace to metaphor Chart

Context:
Screenshot 2023-09-12 at 2 59 28 PM
Screenshot 2023-09-12 at 2 59 46 PM

After all said and done:
Screenshot 2023-09-12 at 3 01 35 PM

Conclusion

  1. Need to make sure find/replace for <CLUSTER_NAME> in config is pointing to actual --cluster-name value
  2. Would be nice to enable bypassing the chart museum 409 existing chart issue without needing to commit a whitespace change. Enabling the option to pass a force flag via k1 CLI is one option (re: discussion with @claywd about Chart 409 error for Chart museum; related GH issue: Push force chartmuseum/helm-push#5)

@alechp
Copy link
Author

alechp commented Sep 13, 2023

Despite GH Action succeeding, still failing to deploy. PUblish failed multiple times, and the dev/stage/prod stuck as pending indefinitely (left it running since yesterday)

Screenshot 2023-09-13 at 1 35 51 PM

Cluster Screenshots

Screenshot 2023-09-13 at 1 01 31 PM
Screenshot 2023-09-13 at 1 04 27 PM
  • Only metaphor (dev/stage/prod namespaces)
Screenshot 2023-09-13 at 1 05 11 PM

@alechp
Copy link
Author

alechp commented Sep 19, 2023

Updated build script to include two extra flags:

--gitops-template-url https://github.com/kubefirst/gitops-template
--gitops-template-branch main

Before metaphor even loads, I'm getting an error on metaphor-development-[hash]-[hash]-set-environment-version-[hash]

Screenshot 2023-09-18 at 6 57 47 PM

Pertinent error from logs:

time="2023-09-19T01:19:50.226Z" level=warning msg="failed to patch task set, falling back to legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" error="workflowtaskresults.argoproj.io is forbidden: User \"system:serviceaccount:argo:argo-server\" cannot create resource \"workflowtaskresults\" in API group \"argoproj.io\" in the namespace \"argo\""

This seems to be what kicks off the initial errors that we saw above; they still exist in current version. Namely:

metaphor-development-32f8779-65f8z-set-environment-version-3899486693: setting wrapper Chart.yaml to version: 0.0.1-rc.32f8779
metaphor-development-32f8779-65f8z-set-environment-version-3899486693: sed: can't read registry/development/components/development/metaphor/Chart.yaml: No such file or directory
Screenshot 2023-09-18 at 7 05 59 PM

Full logs for that failed pod:

time="2023-09-19T01:19:47.789Z" level=info msg="Starting Workflow Executor" version=v3.4.1
time="2023-09-19T01:19:47.796Z" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0.5 Steps=5
time="2023-09-19T01:19:47.796Z" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC" includeScriptOutput=false namespace=argo podName=metaphor-development-32f8779-65f8z-set-environment-version-3899486693 template="{\"name\":\"set-environment-version\",\"inputs\":{\"parameters\":[{\"name\":\"chartVersion\",\"value\":\"0.0.1-rc.32f8779\"},{\"name\":\"environment\",\"value\":\"development\"},{\"name\":\"fullChartPath\",\"value\":\"registry/development/components/development/metaphor/Chart.yaml\"}],\"artifacts\":[{\"name\":\"repo-source\",\"path\":\"/src\",\"s3\":{\"key\":\"argo-workflows/artifacts/2023/09/19/073bca0e-d574-465a-995d-3e53cec87ebd/metaphor-development-32f8779-65f8z/metaphor-development-32f8779-65f8z-checkout-with-gitops-ssh-2115738426/repo-source.tgz\"}}]},\"outputs\":{\"artifacts\":[{\"name\":\"repo-source\",\"path\":\"/src\"}]},\"metadata\":{},\"script\":{\"name\":\"\",\"image\":\"kubefirst/chubbo:0.2\",\"command\":[\"bash\"],\"workingDir\":\"/src/gitops\",\"resources\":{},\"source\":\"set -e\\necho \\\"setting wrapper Chart.yaml to version: 0.0.1-rc.32f8779\\\"\\nsed -i \\\"s/  version:.*/  version: 0.0.1-rc.32f8779/g\\\" \\\"registry/development/components/development/metaphor/Chart.yaml\\\"\\necho \\\"updated development wrapper chart version to 0.0.1-rc.32f8779\\\"\\n\"},\"archiveLocation\":{\"archiveLogs\":false,\"s3\":{\"endpoint\":\"s3.amazonaws.com\",\"bucket\":\"k1-artifacts-capswancloud-lxwgx5\",\"region\":\"us-east-1\",\"insecure\":false,\"accessKeySecret\":{\"key\":\"accesskey\"},\"secretKeySecret\":{\"key\":\"secretkey\"},\"useSDKCreds\":true,\"encryptionOptions\":{},\"key\":\"argo-workflows/artifacts/2023/09/19/073bca0e-d574-465a-995d-3e53cec87ebd/metaphor-development-32f8779-65f8z/metaphor-development-32f8779-65f8z-set-environment-version-3899486693\"}}}" version="&Version{Version:v3.4.1,BuildDate:2022-10-01T15:03:42Z,GitCommit:0546fef0b096d84c9e3362d2b241614e743ebe97,GitTag:v3.4.1,GitTreeState:clean,GoVersion:go1.18.6,Compiler:gc,Platform:linux/amd64,}"
time="2023-09-19T01:19:47.796Z" level=info msg="Starting deadline monitor"
time="2023-09-19T01:19:49.796Z" level=info msg="Main container completed" error="<nil>"
time="2023-09-19T01:19:49.796Z" level=info msg="No Script output reference in workflow. Capturing script output ignored"
time="2023-09-19T01:19:49.796Z" level=info msg="No output parameters"
time="2023-09-19T01:19:49.796Z" level=info msg="Saving output artifacts"
time="2023-09-19T01:19:49.797Z" level=info msg="Staging artifact: repo-source"
time="2023-09-19T01:19:49.797Z" level=info msg="Staging /src from mirrored volume mount /mainctrfs/src"
time="2023-09-19T01:19:49.797Z" level=info msg="Taring /mainctrfs/src"
time="2023-09-19T01:19:49.913Z" level=info msg="archived 293 files/dirs in /mainctrfs/src"
time="2023-09-19T01:19:49.914Z" level=info msg="Successfully staged /src from mirrored volume mount /mainctrfs/src"
time="2023-09-19T01:19:49.914Z" level=info msg="S3 Save path: /tmp/argo/outputs/artifacts/repo-source.tgz, key: argo-workflows/artifacts/2023/09/19/073bca0e-d574-465a-995d-3e53cec87ebd/metaphor-development-32f8779-65f8z/metaphor-development-32f8779-65f8z-set-environment-version-3899486693/repo-source.tgz"
time="2023-09-19T01:19:49.914Z" level=info msg="Creating minio client using AWS SDK credentials"
time="2023-09-19T01:19:50.037Z" level=info msg="Saving file to s3" bucket=k1-artifacts-capswancloud-lxwgx5 endpoint=s3.amazonaws.com key=argo-workflows/artifacts/2023/09/19/073bca0e-d574-465a-995d-3e53cec87ebd/metaphor-development-32f8779-65f8z/metaphor-development-32f8779-65f8z-set-environment-version-3899486693/repo-source.tgz path=/tmp/argo/outputs/artifacts/repo-source.tgz
time="2023-09-19T01:19:50.218Z" level=info msg="Save artifact" artifactName=repo-source duration=304.271429ms error="<nil>" key=argo-workflows/artifacts/2023/09/19/073bca0e-d574-465a-995d-3e53cec87ebd/metaphor-development-32f8779-65f8z/metaphor-development-32f8779-65f8z-set-environment-version-3899486693/repo-source.tgz
time="2023-09-19T01:19:50.218Z" level=info msg="not deleting local artifact" localArtPath=/tmp/argo/outputs/artifacts/repo-source.tgz
time="2023-09-19T01:19:50.218Z" level=info msg="Successfully saved file: /tmp/argo/outputs/artifacts/repo-source.tgz"
time="2023-09-19T01:19:50.226Z" level=info msg="Create workflowtaskresults 403"
time="2023-09-19T01:19:50.226Z" level=warning msg="failed to patch task set, falling back to legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" error="workflowtaskresults.argoproj.io is forbidden: User \"system:serviceaccount:argo:argo-server\" cannot create resource \"workflowtaskresults\" in API group \"argoproj.io\" in the namespace \"argo\""
time="2023-09-19T01:19:50.252Z" level=info msg="Patch pods 200"
time="2023-09-19T01:19:50.256Z" level=info msg="stopping progress monitor (context done)" error="context canceled"
time="2023-09-19T01:19:50.256Z" level=info msg="Deadline monitor stopped"
time="2023-09-19T01:19:50.257Z" level=info msg="Alloc=8355 TotalAlloc=24180 Sys=22994 NumGC=6 Goroutines=9"

@fharper
Copy link
Contributor

fharper commented Sep 19, 2023

There's an active discussion about this on the Slack community at https://kubefirst.slack.com/archives/C03U34WJ7FW/p1694539096211389

@alechp
Copy link
Author

alechp commented Sep 20, 2023

Status update from @claywd on slack: @johndietz working on this

Screenshot 2023-09-20 at 12 18 24 PM

@alechp
Copy link
Author

alechp commented Oct 2, 2023

Status update; blocked pending v2.3.0:
Screenshot 2023-10-02 at 11 29 57 AM

Will test again after #1832 is merged

@alechp
Copy link
Author

alechp commented Nov 4, 2023

Confirmed Metaphor is working now (tested with kubefirst v2.3.5)

CC: @jarededwards @fharper @cameronraysmith

Screenshot 2023-11-05 at 1 54 49 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants