Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shard selecting load balancing #944

Merged
merged 8 commits into from
Mar 14, 2024

Conversation

Lorak-mmk
Copy link
Collaborator

@Lorak-mmk Lorak-mmk commented Mar 2, 2024

This is a more polished version of #791
The changes from #791 are:

  • Totally reworked commit structure, so that each commit actually compiles and passes all tests and checks. Doing this took quite a lot of time. Original commit structure was... a bit chaotic.
  • Fixed all the warnings
  • Removed some functions that were no longer used.
  • Updated a comment that became incorrect

Posting the original description, slightly modified, as it is still relevant:

Motivation

Most of our drivers, being inherited from Cassandra, load balance only over nodes, not specific shards. Multiple ideas have arised that could benefit from having a shard-selecting load balancing. Among them:

  • shard-aware batching (Shard aware batching - add Session::shard_for_statement & Batch::enforce_target_node #738);
  • tablets support:
    with tablets enabled (ATM experimental in ScyllaDB), target shard is not derived from token (computed from partition key), but rather read from system.tablets. Therefore, a load balancer should be able to decide a target shard on its own, by abstracting over either token ring or tablets being used for cluster topology.
  • overloaded shard optimisation:
    some tests have shown that sometimes, when a shard is particularly overloaded, it may be beneficial (performance-wise) to send the request to the proper node, but a wrong shard. That shard would then do part of the work that the overloaded shard would else have to do itself.

Design

  • LB policy now is to return a (NodeRef, Shard) pair, enabling finer-grained control over targeted shards.
  • regarding tablets support: ReplicaLocator is the place where the abstraction over either token ring or tablets is to be implemented. Ideally, the LB policy does not have to be aware of the actual mechanism (token ring or tablets) being used for a particular query.

What's done

  • internal and public load-balancing-related interfaces are changed from NodeRef to (NodeRef, Shard) pair,
  • shard selection logic is removed from NodeConnectionPool; a method is added there that returns a connection to a specific shard,
  • Session's logic propagates the load balancing policy's target shard down to the connection pool,
  • a stub implementation of shard selection is added to ReplicaLocator. At the moment, it simply computes the shard based on the token, the same way as it was done in the connection pool layer before.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@Lorak-mmk Lorak-mmk requested review from piodul and avelanarius March 2, 2024 02:26
@Lorak-mmk Lorak-mmk mentioned this pull request Mar 2, 2024
18 tasks
@Lorak-mmk Lorak-mmk force-pushed the shard-selecting-lb-v2 branch 4 times, most recently from 7d17d2a to 3a79c2f Compare March 5, 2024 11:44
@Lorak-mmk Lorak-mmk self-assigned this Mar 5, 2024
scylla/src/transport/load_balancing/default.rs Outdated Show resolved Hide resolved
scylla/src/transport/load_balancing/default.rs Outdated Show resolved Hide resolved
latency_awareness: Option<LatencyAwareness>,
fixed_shuffle_seed: Option<u64>,
fixed_seed: Option<u64>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Justification for this name change: the seed is now not only used for shuffling nodes, but also for sampling a random shard in case we don't know the proper one.

scylla/tests/integration/execution_profiles.rs Outdated Show resolved Hide resolved
scylla/src/transport/connection_pool.rs Outdated Show resolved Hide resolved
scylla/src/transport/session.rs Outdated Show resolved Hide resolved
@Lorak-mmk Lorak-mmk force-pushed the shard-selecting-lb-v2 branch from 3a79c2f to 13fca82 Compare March 5, 2024 17:31
@Lorak-mmk Lorak-mmk force-pushed the shard-selecting-lb-v2 branch 2 times, most recently from e6bb969 to cf8e21b Compare March 9, 2024 21:21
wprzytula and others added 7 commits March 12, 2024 14:57
This way, we avoid clones of datacenters that appear multiple times.
This is more readable in my opinion.


Co-authored-by: Wojciech Przytuła <[email protected]>
For now the returned shards will be ignored by rest of the code.
Previously shard calculations were done by ConnectionPool,
but in order to allow shard-aware LoadBalancingPolicy to exist,
it must get sharded replicas from locator.


Co-authored-by: Wojciech Przytuła <[email protected]>
Co-authored-by: Wojciech Przytuła <[email protected]>
Co-authored-by: Wojciech Przytuła <[email protected]>
@Lorak-mmk Lorak-mmk force-pushed the shard-selecting-lb-v2 branch from cf8e21b to 48aa642 Compare March 12, 2024 13:57
Copy link

github-actions bot commented Mar 12, 2024

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳
Checked commit: 28ae015

@wprzytula
Copy link
Collaborator

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳

Well, that's unfortunately not true, but it's a known limitation of semver-checks...

The key takeaway is to not trust semver-checks fully, as there can be quite a lot of false negatives.

@piodul
Copy link
Collaborator

piodul commented Mar 12, 2024

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳

Well, that's unfortunately not true, but it's a known limitation of semver-checks...

The key takeaway is to not trust semver-checks fully, as there can be quite a lot of false negatives.

The message indeed sounds too confident, perhaps we should add some caution about the possible false negatives.

docs/source/load-balancing/load-balancing.md Outdated Show resolved Hide resolved
docs/source/load-balancing/load-balancing.md Outdated Show resolved Hide resolved
docs/source/load-balancing/load-balancing.md Show resolved Hide resolved
docs/source/load-balancing/load-balancing.md Outdated Show resolved Hide resolved
@Lorak-mmk Lorak-mmk force-pushed the shard-selecting-lb-v2 branch from 86c03dd to 16dcc19 Compare March 13, 2024 13:37
@Lorak-mmk Lorak-mmk requested a review from wprzytula March 13, 2024 13:38
Copy link
Collaborator

@wprzytula wprzytula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, some nits only.

scylla/src/transport/load_balancing/mod.rs Outdated Show resolved Hide resolved
scylla/src/transport/load_balancing/mod.rs Outdated Show resolved Hide resolved
docs/source/load-balancing/load-balancing.md Outdated Show resolved Hide resolved
docs/source/load-balancing/load-balancing.md Outdated Show resolved Hide resolved
scylla/src/transport/load_balancing/mod.rs Show resolved Hide resolved
This commit also includes some changes that were omitted during older
changes to LBP.
@Lorak-mmk Lorak-mmk force-pushed the shard-selecting-lb-v2 branch from 16dcc19 to 28ae015 Compare March 13, 2024 14:45
@Lorak-mmk Lorak-mmk requested a review from wprzytula March 13, 2024 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants