You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we can run MAX_WORKERS jobs, each with up to 128GB of memory (a global config). At time of writing MAX_WORKERS is 20, so that's a potential of 2560GB. We have ~610GB on TPP, so we are ~4x overcommitted at peak. Normally this is fine, as most jobs use a lot less memory that this. Occasionally, this is not true, and while each job is below 128GB, 20 of them exceeds 610GB, and we get the bad kind of OOM behaviour (as it has to choose which docker process to kill, which the OS equivalent of Undefined Behaviour).
The proper solution for this is probably to enable per-job limits. Most jobs can run on a limit of 64GB or 32GB, only some need 128GB. This would also enable size based scheduling, which is well understood (think cloud VMs scheduling).
A dumber and simpler option might be to just check for a minimum amount of free memory before executing a job. This should be fairly simple to execute, and would apply dynamic backpressure without complicated scheduling algos. We could to the same with disk too, perhaps
The text was updated successfully, but these errors were encountered:
Currently, we can run MAX_WORKERS jobs, each with up to 128GB of memory (a global config). At time of writing MAX_WORKERS is 20, so that's a potential of 2560GB. We have ~610GB on TPP, so we are ~4x overcommitted at peak. Normally this is fine, as most jobs use a lot less memory that this. Occasionally, this is not true, and while each job is below 128GB, 20 of them exceeds 610GB, and we get the bad kind of OOM behaviour (as it has to choose which docker process to kill, which the OS equivalent of Undefined Behaviour).
The proper solution for this is probably to enable per-job limits. Most jobs can run on a limit of 64GB or 32GB, only some need 128GB. This would also enable size based scheduling, which is well understood (think cloud VMs scheduling).
A dumber and simpler option might be to just check for a minimum amount of free memory before executing a job. This should be fairly simple to execute, and would apply dynamic backpressure without complicated scheduling algos. We could to the same with disk too, perhaps
The text was updated successfully, but these errors were encountered: