-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
leader_balancer: implement and use even_node_load_constraint #24403
Conversation
d555bc2
to
866fcaa
Compare
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/59199#01938f10-c755-43d5-8b2c-e4ef0e5f0bb1 |
Retry command for Build#59199please wait until all jobs are finished before running the slash command
|
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59199#01938f10-c757-49b3-b82e-bc73aab958ca:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59199#01938f16-88ee-4361-8861-e4873a4b5470:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59199#01938f16-88f0-4745-84ab-85b9672d2dea:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59199#01938f10-c758-48ed-bddd-0a3728aa65bd:
|
What's an example of the imbalance that occurs when only taking cores into account? |
@dotnwat It's in the unit test - something like this:
From the shard POV the distribution is balanced - each shard has 0 or 1 leaders. But node-wise distribution is very imbalanced. |
return reassignment_opt; | ||
auto node_load_diff = _enlc.evaluate(reassignment); | ||
if (node_load_diff < -error_jitter) { | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check doesn't make a difference, as we anyway continue the loop if the one below does not hold. Is it for the future when we evaluate more constraints of even lower priority below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is just for symmetry. Although I guess one more constraint and I'm going to put them in a vector :)
Ahh, I see too bad we can't have half a leader. Missed the unit test explanation, thanks! |
Leader balancer treats all CPU cores available in the cluster as independent and tries to balance leadership among them and not among nodes (i.e. the objective is that each core has the same number of leaders, and not each node). This leads to unintuitive results, especially when the number of raft groups is comparable to the number of available shards. Add a low-priority node balancing constraint to fix that.
Previously it didn't copy empty shards in the index.
Makes it easier to read the stats.
866fcaa
to
1305cdd
Compare
Give more time for muted groups to unmute.
1305cdd
to
049e258
Compare
/backport v24.3.x |
Leader balancer treats all CPU cores available in the cluster as independent and tries to balance leadership among them and not among nodes (i.e. the objective is that each core has the same number of leaders, and not each node). This leads to unintuitive results, especially when the number of raft groups is comparable to the number of available shards.
Add a low-priority node balancing constraint to fix that.
Fixes https://redpandadata.atlassian.net/browse/CORE-8237
Backports Required
Release Notes
Improvements