leader_balancer: implement and use even_node_load_constraint #24403

ztlpn · 2024-12-02T23:33:24Z

Leader balancer treats all CPU cores available in the cluster as independent and tries to balance leadership among them and not among nodes (i.e. the objective is that each core has the same number of leaders, and not each node). This leads to unintuitive results, especially when the number of raft groups is comparable to the number of available shards.

Add a low-priority node balancing constraint to fix that.

Fixes https://redpandadata.atlassian.net/browse/CORE-8237

Backports Required

Release Notes

Improvements

Leader balancer: don't treat each core as independent and balance total number of leaders on each node as well.

vbotbuildovich · 2024-12-04T01:41:44Z

vbotbuildovich · 2024-12-04T01:41:54Z

Retry command for Build#59199

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/license_enforcement_test.py::LicenseEnforcementTest.test_license_enforcement@{"clean_node_after_recovery":true,"clean_node_before_recovery":false}
tests/rptest/tests/license_enforcement_test.py::LicenseEnforcementTest.test_escape_hatch_license_variable@{"clean_node_before_upgrade":false}
tests/rptest/tests/license_enforcement_test.py::LicenseEnforcementTest.test_license_enforcement@{"clean_node_after_recovery":false,"clean_node_before_recovery":true}
tests/rptest/tests/license_enforcement_test.py::LicenseEnforcementTest.test_license_enforcement@{"clean_node_after_recovery":false,"clean_node_before_recovery":false}
tests/rptest/tests/license_enforcement_test.py::LicenseEnforcementTest.test_escape_hatch_license_variable@{"clean_node_before_upgrade":true}
tests/rptest/tests/license_enforcement_test.py::LicenseEnforcementTest.test_license_enforcement@{"clean_node_after_recovery":true,"clean_node_before_recovery":true}

vbotbuildovich · 2024-12-04T01:43:05Z

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59199#01938f10-c757-49b3-b82e-bc73aab958ca:

"rptest.tests.license_enforcement_test.LicenseEnforcementTest.test_license_enforcement.clean_node_before_recovery=True.clean_node_after_recovery=False"

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59199#01938f16-88ee-4361-8861-e4873a4b5470:

"rptest.tests.license_enforcement_test.LicenseEnforcementTest.test_license_enforcement.clean_node_before_recovery=True.clean_node_after_recovery=False"

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59199#01938f16-88f0-4745-84ab-85b9672d2dea:

"rptest.tests.license_enforcement_test.LicenseEnforcementTest.test_escape_hatch_license_variable.clean_node_before_upgrade=True"

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/59199#01938f10-c758-48ed-bddd-0a3728aa65bd:

"rptest.tests.license_enforcement_test.LicenseEnforcementTest.test_escape_hatch_license_variable.clean_node_before_upgrade=True"

dotnwat · 2024-12-04T02:33:06Z

What's an example of the imbalance that occurs when only taking cores into account?

ztlpn · 2024-12-04T11:21:45Z

@dotnwat It's in the unit test - something like this:

3-node cluster, 10 shards each
20 partitions with rf=3
all leaders on nodes 0 and 1

From the shard POV the distribution is balanced - each shard has 0 or 1 leaders. But node-wise distribution is very imbalanced.

src/v/cluster/scheduling/leader_balancer_constraints.cc

bashtanov · 2024-12-04T14:20:13Z

src/v/cluster/scheduling/leader_balancer_random.h

-            return reassignment_opt;
+            auto node_load_diff = _enlc.evaluate(reassignment);
+            if (node_load_diff < -error_jitter) {
+                continue;


This check doesn't make a difference, as we anyway continue the loop if the one below does not hold. Is it for the future when we evaluate more constraints of even lower priority below?

Yeah, it is just for symmetry. Although I guess one more constraint and I'm going to put them in a vector :)

src/v/cluster/tests/leader_balancer_constraints_test.cc

dotnwat · 2024-12-04T15:12:50Z

@dotnwat It's in the unit test - something like this:

3-node cluster, 10 shards each

20 partitions with rf=3

all leaders on nodes 0 and 1

From the shard POV the distribution is balanced - each shard has 0 or 1 leaders. But node-wise distribution is very imbalanced.

Ahh, I see too bad we can't have half a leader. Missed the unit test explanation, thanks!

Leader balancer treats all CPU cores available in the cluster as independent and tries to balance leadership among them and not among nodes (i.e. the objective is that each core has the same number of leaders, and not each node). This leads to unintuitive results, especially when the number of raft groups is comparable to the number of available shards. Add a low-priority node balancing constraint to fix that.

Previously it didn't copy empty shards in the index.

Makes it easier to read the stats.

Give more time for muted groups to unmute.

vbotbuildovich · 2024-12-04T22:37:24Z

/backport v24.3.x

ztlpn requested review from bharathv, bashtanov and mmaslankaprv December 2, 2024 23:33

github-actions bot added the area/redpanda label Dec 2, 2024

ztlpn force-pushed the leader-balancer-even-node-load branch 2 times, most recently from d555bc2 to 866fcaa Compare December 3, 2024 23:14

bashtanov reviewed Dec 4, 2024

View reviewed changes

src/v/cluster/scheduling/leader_balancer_constraints.cc Show resolved Hide resolved

bashtanov reviewed Dec 4, 2024

View reviewed changes

src/v/cluster/tests/leader_balancer_constraints_test.cc Outdated Show resolved Hide resolved

ztlpn added 3 commits December 4, 2024 17:52

c/leader_balancer/ut: fix copy_cluster_index helper

f93b43a

Previously it didn't copy empty shards in the index.

c/leader_balancer: sort stats by shard

6db1730

Makes it easier to read the stats.

ztlpn force-pushed the leader-balancer-even-node-load branch from 866fcaa to 1305cdd Compare December 4, 2024 16:52

ztlpn added 2 commits December 4, 2024 17:54

c/leader_balancer: add even shard load/uneven node load utest

5695541

tests: increase wait_for_racks timeout in leadership_transfer_test

049e258

Give more time for muted groups to unmute.

ztlpn force-pushed the leader-balancer-even-node-load branch from 1305cdd to 049e258 Compare December 4, 2024 16:54

ztlpn requested a review from bashtanov December 4, 2024 16:55

bashtanov approved these changes Dec 4, 2024

View reviewed changes

ztlpn merged commit ef1c094 into redpanda-data:dev Dec 4, 2024
17 checks passed

vbotbuildovich mentioned this pull request Dec 4, 2024

[v24.3.x] leader_balancer: implement and use even_node_load_constraint #24440

Merged

ztlpn deleted the leader-balancer-even-node-load branch December 4, 2024 22:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

leader_balancer: implement and use even_node_load_constraint #24403

leader_balancer: implement and use even_node_load_constraint #24403

ztlpn commented Dec 2, 2024

vbotbuildovich commented Dec 4, 2024 •

edited

Loading

vbotbuildovich commented Dec 4, 2024 •

edited

Loading

vbotbuildovich commented Dec 4, 2024 •

edited

Loading

dotnwat commented Dec 4, 2024

ztlpn commented Dec 4, 2024

bashtanov Dec 4, 2024

ztlpn Dec 4, 2024

dotnwat commented Dec 4, 2024

vbotbuildovich commented Dec 4, 2024

leader_balancer: implement and use even_node_load_constraint #24403

leader_balancer: implement and use even_node_load_constraint #24403

Conversation

ztlpn commented Dec 2, 2024

Backports Required

Release Notes

Improvements

vbotbuildovich commented Dec 4, 2024 • edited Loading

vbotbuildovich commented Dec 4, 2024 • edited Loading

Retry command for Build#59199

vbotbuildovich commented Dec 4, 2024 • edited Loading

dotnwat commented Dec 4, 2024

ztlpn commented Dec 4, 2024

bashtanov Dec 4, 2024

Choose a reason for hiding this comment

ztlpn Dec 4, 2024

Choose a reason for hiding this comment

dotnwat commented Dec 4, 2024

vbotbuildovich commented Dec 4, 2024

vbotbuildovich commented Dec 4, 2024 •

edited

Loading

vbotbuildovich commented Dec 4, 2024 •

edited

Loading

vbotbuildovich commented Dec 4, 2024 •

edited

Loading