You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
I have searched existing issues and could not find a match for this bug
When running a workflow with multiple synchronization locks, where one is a mutex and one is a semaphore, you can get into a case where the shown Pending message is misleading/confusing.
Since a workflow must wait for all locks to be available, but the queue is processed in order, if you have say 5 jobs in the queue that all rely on the same mutex, and then 5 more jobs behind them that don't, but they all rely on the same semaphore, the jobs that had a different mutex will not run until all of the jobs relying on the same mutex finish.
What is misleading is the case where there is plenty of semaphore room, but you are violating your position in queue. You end up with a pending message that says waiting for lock but shows you plenty of lock room, but what it really should say is "Waiting for position in queue" or something.
This would help troubleshooting situations as to why not a lot of jobs are running even though the semaphore has lots of room.
Version(s)
v3.6.2
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflow that uses private images.
Example workflow level synchronization config, where the semaphore is set to say 200 and the mutex has 5 jobs with the same key in front of a job with a different keysynchronization:
semaphore:
configMapKeyRef:
name: my_config_mapkey: lots_of_jobsnamespace: defaultmutex:
name: my_uuidnamespace: default
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened? What did you expect to happen?
When running a workflow with multiple synchronization locks, where one is a mutex and one is a semaphore, you can get into a case where the shown Pending message is misleading/confusing.
Since a workflow must wait for all locks to be available, but the queue is processed in order, if you have say 5 jobs in the queue that all rely on the same mutex, and then 5 more jobs behind them that don't, but they all rely on the same semaphore, the jobs that had a different mutex will not run until all of the jobs relying on the same mutex finish.
This is not the issue, this is expected and fully explained in the docs. However, the message the system returns for those not relying on the mutex is
fmt.Sprintf("Waiting for %s lock. Lock status: %d/%d", s.name, s.limit-len(s.lockHolder), s.limit)
from https://github.com/argoproj/argo-workflows/blob/v3.6.2/workflow/sync/semaphore.go#L176.What is misleading is the case where there is plenty of semaphore room, but you are violating your position in queue. You end up with a pending message that says waiting for lock but shows you plenty of lock room, but what it really should say is "Waiting for position in queue" or something.
This would help troubleshooting situations as to why not a lot of jobs are running even though the semaphore has lots of room.
Version(s)
v3.6.2
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflow that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: