-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parts artifacts in the map-reduce example are not garbage collected #14091
Comments
Looks like it should be gc. |
I think I understand what the issue is. At https://github.com/argoproj/argo-workflows/blob/19b2322/workflow/artifacts/s3/s3.go#L188 the S3 driver checks if the path has a trailing slash to decide if it's a folder or a single file. In this case, it's a folder but the example does not put a trailing slash at the end. I think a better way to implement it could be to check if the artifact can use If the current implementation is decided to be kept, I think the documentation should show the proper way to produce folder artifacts with a trailing slash to avoid confusion. edit: I just saw the comment |
I have just confirmed that when a trailing slash is put (i.e. and when the artifacts are garbage collected, the parts folder is also cleaned in this case. I'll create a PR to fix the example YAML. |
Good discovery |
I still think that missing a trailing slash in folder artifacts resulting in indefinitely ever-growing S3 buckets is a foot-gun. I think it might be worth an additional S3 API call to check if it's a folder or not. |
Agree. Although the number of requests is reduced, this judgment method is not correct |
The weird thing is, it checks if the artifact is a folder in the Azure driver. The GCS driver does not explicitly handle folders at all, maybe the client itself handles it I'm not familiar with it. I think the behavior should be symmetrical between all implementations. Should we create a separate issue for it? I might take a swing at it and fix their implementations. |
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened? What did you expect to happen?
I have a default workflow configuration that has:
and I am directly applying this map-reduce example without any modifications (except adding a namespace).
The workflow runs smoothly without any problems:
and creates these artifacts on S3:
after I delete the workflow manually, workflow controller seems to think it garbage collected everything:
alas, the
parts/
folder remains on S3 dangling indefinitely:I expected all the artifacts to be collected properly.
Version(s)
v3.6.2, latest (74ed09303da63b29fc08319236e2ae412269dd4bc0c919ca802bbf75e205e898)
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflow that uses private images.
https://github.com/argoproj/argo-workflows/blob/dd19e49/examples/map-reduce.yaml
Logs from the workflow controller
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: