-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rm_stm: couple of stability fixes noticed when down scaling max_concurrent_producer_ids #18573
rm_stm: couple of stability fixes noticed when down scaling max_concurrent_producer_ids #18573
Conversation
/dt |
Seems good. Isn't part of the problem that evict loop here is not async so the tasks just accumulate without getting any chance to run? |
How often does the eviction tick? With only 100 per tick is it possible that the PIDs can now grow faster than we evict them? Instead of spawning on task per PID it seems better to just collect all the PIDs we are going to evict and spawn off a single task to do the eviction. This would also solve the specific allocation failure. |
Ya right, the original intention was to not have a scheduling point during the eviction tick so the iterators in the list it is traversing are not invalidated. I have another idea that got rid of the async evict function, that seems to be working locally, I'll clean it up and push it shortly.
It runs every 5s, ya probably 100 per tick is too small.. I think with a non futurized evict implementation (next PS), we can evict more in one go. |
When a lot of partitions startup on the shard at the same time, we noticed crashes in this part of the code when the snapshot sizes are non trivial (large # of producers in the snapshot). This patch releases already applied snapshot state to ease the memory pressure a bit.
3b0ac32
to
8db9579
Compare
/ci-repeat 5 |
8db9579
to
2ad9dfd
Compare
/dt |
/ci-repeat 5 |
Known failure: #12897 |
.. to be used in the next commit.
This test ensures concurrent evictions can happen in the presence of replication operations and operations that reset the state (snapshots, partition stop.
Prior to this commit producer_state::evict() is asynchronous because it waited on a gate to drain out any pending tasks on the producer state. This resulted in a fiber per evict() call in the producer_state_manager resulting in large number of fibers when evicting a ton of producers in one eviction tick (which manifested as a large allocation). This commit fixes the issue by making evict() synchronous and getting rid of the gate that was forcing it. That is ok because the eviction tick runs synchronously (without scheduling points) and only considers candidates that are still linked to the shard wide list. Any caller function using the producer state either waits on op_lock or detaches from the list before running the operation ensuring that it is not considered for eviction.
2ad9dfd
to
63bb606
Compare
Were you able to reproduce the crash with the new tests you added? |
The new tests in this PR mainly validates the correctness and memory safety of eviction and not the OOM issue we ran into. I'm not able to reproduce the exact OOM that the user ran into with spawning too many tasks at once, perhaps it needs a lot of background noise in the test to ensure fragmentation. I'm pushing a small test that ensures only limited number of producers are evicted in each tick and that combined with the fact that there are no task per eviction probably addresses the issue. |
63bb606
to
60868ed
Compare
.. to avoid reactor stalls.
60868ed
to
d7aa167
Compare
On a cluster with a large pile up of historical producer_ids, a sudden down scale of the configuration resulted in crashes.
TBD: Link GH issues.
Backports Required
Release Notes