Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retentionPolicy counts deleted workflows #14081

Open
4 tasks done
martin-redmaple opened this issue Jan 14, 2025 · 1 comment
Open
4 tasks done

retentionPolicy counts deleted workflows #14081

martin-redmaple opened this issue Jan 14, 2025 · 1 comment
Labels
area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more type/bug

Comments

@martin-redmaple
Copy link

martin-redmaple commented Jan 14, 2025

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened? What did you expect to happen?

When evaluating whether a workflow should be deleted due to maxRetention limits, workflows that have already been deleted by another mechanism (e.g. CronWorkflow successfulJobsHistoryLimit) continue to be counted.

I would expect the maxRetention limit to only count workflows that have not been deleted.

Version(s)

13afac57453f

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflow that uses private images.

Reduce the maxRetention limits to make it easier to recreate the issue
i.e.

  # Workflow retention by number of workflows
  retentionPolicy: |
    completed: 10
    failed: 10
    errored: 10

Using a simple workflow e.g.

  apiVersion: argoproj.io/v1alpha1
  kind: Workflow
  metadata:
    generateName: hello-world-
    labels:
      workflows.argoproj.io/archive-strategy: "false"
    annotations:
      workflows.argoproj.io/description: |
        This is a simple hello world example.
  spec:
    entrypoint: hello-world
    templates:
    - name: hello-world
      container:
        image: busybox
        command: [echo]
        args: ["hello world"]

run it 10 times and see that the history of all 10 workflows is maintained.

Now create a simple CronWorkflow that runs every minute. e.g.

  apiVersion: argoproj.io/v1alpha1
  kind: CronWorkflow
  metadata:
    name: hello-world
  spec:
    schedule: "* * * * *"
    timezone: "America/Los_Angeles"   # Default to local machine timezone
    startingDeadlineSeconds: 0
    concurrencyPolicy: "Replace"      # Default to "Allow"
    successfulJobsHistoryLimit: 4     # Default 3
    failedJobsHistoryLimit: 4         # Default 1
    suspend: false                    # Set to "true" to suspend scheduling
    workflowSpec:
      entrypoint: hello-world-with-time
      templates:
        - name: hello-world-with-time
          container:
            image: busybox
            command: [echo]
            args: ["🕓 hello world. Scheduled on: {{workflow.scheduledTime}}"]

Let this run and observe that once the workflows created by this CronWorkflow begin to be removed by the successfulJobsHistoryLimit (in this case 4), they are still considered when evaluating maxRetention.

Logs from the workflow controller

time="2025-01-14T20:55:10.046Z" level=info msg="Workflow to be dehydrated" Workflow Size=1542
time="2025-01-14T20:55:10.060Z" level=info msg="Workflow update successful" namespace=argo phase=Succeeded resourceVersion=17686 workflow=hello-world-1736888100
--> time="2025-01-14T20:55:10.061Z" level=info msg="Queueing Succeeded workflow argo/hello-world-tjsjx for delete due to max rention(10 workflows)" <-- At this point there are not 10 workflows
time="2025-01-14T20:55:10.061Z" level=info msg="Deleting garbage collected workflow 'argo/hello-world-tjsjx'"
time="2025-01-14T20:55:10.061Z" level=info msg="Queueing Succeeded workflow argo/hello-world-1736888100 for delete in 24h0m0s due to TTL"
time="2025-01-14T20:55:10.072Z" level=info msg="Successfully request 'argo/hello-world-tjsjx' to be deleted"

Logs from in your workflow's wait container

N/A for this issue
@shuangkun
Copy link
Member

It seems that the deletion of workflow also needs to be perceived here.

_, err = wfInformer.AddEventHandler(cache.FilteringResourceEventHandler{

@shuangkun shuangkun added the area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more label Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/gc Garbage collection, such as TTLs, retentionPolicy, delays, and more type/bug
Projects
None yet
Development

No branches or pull requests

2 participants