-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory (RAM) increases exponentially on submitting large number of workflows and it s not cleared even after the workflow run #14084
Comments
We have a different usage pattern - not that many workflow (10 at a time), but each workflow is gigantic. I don't have any comment on how much memory a controller should use. |
Could see any logs?
Or
see any oom condations. |
HI @tczhao. Thanks for replying. We have this configured. |
HI @shuangkun, Thanks for the reply Have attached logs here |
Try |
We dont see any archived workflows from the above command. Tried to check what accumulating in the heap memory. Found out this goroutines accumulates most.img |
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened? What did you expect to happen?
Description:
Memory (RAM) increases exponentially on submitting large number of workflows and it s not cleared even after the workflow run
Environment:
Argo Workflows version: 3.5.12
Parallel workflows: 6000+
What happened:
The Argo Workflows Controller's memory consumption increases exponentially. Despite this excessive memory usage, the controller crashes frequently. It does not log any specific error messages prior to these crashes, making it challenging to pinpoint the cause or underlying issue.
Hence we did a memory profiling of argo controller pod and some experiments were tried out to check whether it can be mitigated.
How to reproduce it (as minimally and precisely as possible):
Set up an environment with 300+ nodes.
Launch 5000+ workflows in parallel.
Monitor the RAM usage of the Argo Workflows Controller and note any unexpected crashes.
Version(s)
v3.5.8, v3.5.10, v3.5.12
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflow that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: