-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flake: failing during image pull when building podinfo-flux package in test-external #3194
Comments
I've validated that this is not caused by by disk space as the error in this case will look different failed to create package: All attempts fail:
#1: error writing layer: write
/tmp/zarf-2081439845/images/blobs/sha256/000f791482e95f5e804ace91e5d39e0d48723c758a6adc740738cc1f9cd296153189335322:
no space left on device
#2: error writing layer: write
/tmp/zarf-2081439845/images/blobs/sha256/000f791482e95f5e804ace91e5d39e0d48723c758a6adc740738cc1f9cd296152478328656:
no space left on devic |
I run a script to build the podinfo-flux package (what the test flakes on) 100 times in two different terminals in parallel. I was not able to repeat this error. @RothAndrew reported that a similar error happening to him during his day to day with a separate private package. It does not happen to him in the images are not in the Zarf cache. This follows because in our usual e2e tests we delete the zarf cache right away for storage purpose, likely that is what's causing the flake to only appear in the test-external workflow |
It happens so persistently for me that I ended up doing this pretty much anywhere I’m making zarf packages now. https://github.com/defenseunicorns-partnerships/wfapi/blob/main/scripts/build_zarf_package.sh |
I wonder if the image size or number of layers makes a difference when trying to recreate. Podinfo is much smaller than most of the images I work with. |
@RothAndrew Has it ever happened with only one image? Every failure I looked at in test external - https://github.com/zarf-dev/zarf/actions/workflows/test-external.yml?query=is%3Afailure fails with - ghcr.io/fluxcd/helm-controller:v1.1.0
- ghcr.io/fluxcd/image-automation-controller:v0.39.0
- ghcr.io/fluxcd/image-reflector-controller:v0.33.0
- ghcr.io/fluxcd/kustomize-controller:v1.4.0
- ghcr.io/fluxcd/notification-controller:v1.4.0
- ghcr.io/fluxcd/source-controller:v1.4.1 |
I'm not sure. I feel like it definitely happens more when there are multiple images, or the images are large, or the registry it is pulling from is slow |
Pretty sure I found the issue, Zarf was not properly deleting invalid layers from the cache when they occur. @RothAndrew Feel free to test out #3358, though either way the team will see in time if the flake disappears |
Describe what should be investigated or refactored
Seeing a flake in the test-external workflow. Images are failing to be saved.
Workflow run
Relevant logs:
The text was updated successfully, but these errors were encountered: