Shard selecting load balancing #944

Lorak-mmk · 2024-03-02T02:26:40Z

This is a more polished version of #791
The changes from #791 are:

Totally reworked commit structure, so that each commit actually compiles and passes all tests and checks. Doing this took quite a lot of time. Original commit structure was... a bit chaotic.
Fixed all the warnings
Removed some functions that were no longer used.
Updated a comment that became incorrect

Posting the original description, slightly modified, as it is still relevant:

Motivation

Most of our drivers, being inherited from Cassandra, load balance only over nodes, not specific shards. Multiple ideas have arised that could benefit from having a shard-selecting load balancing. Among them:

shard-aware batching (Shard aware batching - add Session::shard_for_statement & Batch::enforce_target_node #738);
tablets support:
with tablets enabled (ATM experimental in ScyllaDB), target shard is not derived from token (computed from partition key), but rather read from system.tablets. Therefore, a load balancer should be able to decide a target shard on its own, by abstracting over either token ring or tablets being used for cluster topology.
overloaded shard optimisation:
some tests have shown that sometimes, when a shard is particularly overloaded, it may be beneficial (performance-wise) to send the request to the proper node, but a wrong shard. That shard would then do part of the work that the overloaded shard would else have to do itself.

Design

LB policy now is to return a (NodeRef, Shard) pair, enabling finer-grained control over targeted shards.
regarding tablets support: ReplicaLocator is the place where the abstraction over either token ring or tablets is to be implemented. Ideally, the LB policy does not have to be aware of the actual mechanism (token ring or tablets) being used for a particular query.

What's done

internal and public load-balancing-related interfaces are changed from NodeRef to (NodeRef, Shard) pair,
shard selection logic is removed from NodeConnectionPool; a method is added there that returns a connection to a specific shard,
Session's logic propagates the load balancing policy's target shard down to the connection pool,
a stub implementation of shard selection is added to ReplicaLocator. At the moment, it simply computes the shard based on the token, the same way as it was done in the connection pool layer before.

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.
I added appropriate Fixes: annotations to PR description.

scylla/src/transport/load_balancing/default.rs

wprzytula · 2024-03-05T14:29:54Z

scylla/src/transport/load_balancing/default.rs

    latency_awareness: Option<LatencyAwareness>,
-    fixed_shuffle_seed: Option<u64>,
+    fixed_seed: Option<u64>,


Justification for this name change: the seed is now not only used for shuffling nodes, but also for sampling a random shard in case we don't know the proper one.

scylla/tests/integration/execution_profiles.rs

scylla/src/transport/connection_pool.rs

scylla/src/transport/session.rs

This way, we avoid clones of datacenters that appear multiple times.

This is more readable in my opinion. Co-authored-by: Wojciech Przytuła <[email protected]>

For now the returned shards will be ignored by rest of the code. Previously shard calculations were done by ConnectionPool, but in order to allow shard-aware LoadBalancingPolicy to exist, it must get sharded replicas from locator. Co-authored-by: Wojciech Przytuła <[email protected]>

Co-authored-by: Wojciech Przytuła <[email protected]>

github-actions · 2024-03-12T14:02:44Z

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳
Checked commit: 28ae015

wprzytula · 2024-03-12T14:09:39Z

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳

Well, that's unfortunately not true, but it's a known limitation of semver-checks...

The key takeaway is to not trust semver-checks fully, as there can be quite a lot of false negatives.

piodul · 2024-03-12T14:13:47Z

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳

Well, that's unfortunately not true, but it's a known limitation of semver-checks...

The key takeaway is to not trust semver-checks fully, as there can be quite a lot of false negatives.

The message indeed sounds too confident, perhaps we should add some caution about the possible false negatives.

docs/source/load-balancing/load-balancing.md

wprzytula

Mostly LGTM, some nits only.

scylla/src/transport/load_balancing/mod.rs

docs/source/load-balancing/load-balancing.md

scylla/src/transport/load_balancing/mod.rs

This commit also includes some changes that were omitted during older changes to LBP.

Lorak-mmk requested review from piodul and avelanarius March 2, 2024 02:26

Lorak-mmk mentioned this pull request Mar 2, 2024

Introduce support for Tablets #937

Merged

18 tasks

Lorak-mmk force-pushed the shard-selecting-lb-v2 branch 4 times, most recently from 7d17d2a to 3a79c2f Compare March 5, 2024 11:44

Lorak-mmk self-assigned this Mar 5, 2024

wprzytula reviewed Mar 5, 2024

View reviewed changes

Lorak-mmk force-pushed the shard-selecting-lb-v2 branch from 3a79c2f to 13fca82 Compare March 5, 2024 17:31

wprzytula approved these changes Mar 6, 2024

View reviewed changes

Lorak-mmk force-pushed the shard-selecting-lb-v2 branch 2 times, most recently from e6bb969 to cf8e21b Compare March 9, 2024 21:21

wprzytula and others added 7 commits March 12, 2024 14:57

locator: ReplicaLocator::new(): clone only late

efa3536

This way, we avoid clones of datacenters that appear multiple times.

Use imported NodeRef

f6b75f7

This is more readable in my opinion. Co-authored-by: Wojciech Przytuła <[email protected]>

Make LoadBalancingPolicy shard-aware

5a82a92

Co-authored-by: Wojciech Przytuła <[email protected]>

load_balancing: make Plan shard-aware

55e6c80

Co-authored-by: Wojciech Przytuła <[email protected]>

Cluster: get_endpoints and related methods return shards

ae069e8

Co-authored-by: Wojciech Przytuła <[email protected]>

Use shard from query plan during execution

48aa642

Co-authored-by: Wojciech Przytuła <[email protected]>

Lorak-mmk force-pushed the shard-selecting-lb-v2 branch from cf8e21b to 48aa642 Compare March 12, 2024 13:57

wprzytula requested changes Mar 13, 2024

View reviewed changes

Lorak-mmk force-pushed the shard-selecting-lb-v2 branch from 86c03dd to 16dcc19 Compare March 13, 2024 13:37

Lorak-mmk requested a review from wprzytula March 13, 2024 13:38

wprzytula approved these changes Mar 13, 2024

View reviewed changes

Docs: Informations about shard-aware LBPs

28ae015

This commit also includes some changes that were omitted during older changes to LBP.

Lorak-mmk force-pushed the shard-selecting-lb-v2 branch from 16dcc19 to 28ae015 Compare March 13, 2024 14:45

Lorak-mmk requested a review from wprzytula March 13, 2024 14:45

wprzytula approved these changes Mar 14, 2024

View reviewed changes

wprzytula merged commit 8e845e7 into scylladb:main Mar 14, 2024
12 checks passed

wprzytula mentioned this pull request Mar 19, 2024

Shard aware batching - add Session::shard_for_statement & Batch::enforce_target_node #738

Open

8 tasks

Lorak-mmk mentioned this pull request Mar 20, 2024

speculative_execution test flakiness #967

Closed

Lorak-mmk mentioned this pull request May 9, 2024

Release 0.13 #995

Merged

dkropachev mentioned this pull request Jun 19, 2024

Extends gocql API for scylla shard aware info scylladb/gocql#164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shard selecting load balancing #944

Shard selecting load balancing #944

Lorak-mmk commented Mar 2, 2024 •

edited

Loading

wprzytula Mar 5, 2024

github-actions bot commented Mar 12, 2024 •

edited

Loading

wprzytula commented Mar 12, 2024

piodul commented Mar 12, 2024

wprzytula left a comment

Shard selecting load balancing #944

Shard selecting load balancing #944

Conversation

Lorak-mmk commented Mar 2, 2024 • edited Loading

Motivation

Design

What's done

Pre-review checklist

wprzytula Mar 5, 2024

Choose a reason for hiding this comment

github-actions bot commented Mar 12, 2024 • edited Loading

wprzytula commented Mar 12, 2024

piodul commented Mar 12, 2024

wprzytula left a comment

Choose a reason for hiding this comment

Lorak-mmk commented Mar 2, 2024 •

edited

Loading

github-actions bot commented Mar 12, 2024 •

edited

Loading