-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better connection management #261
Conversation
4228019
to
eb11747
Compare
eb11747
to
f8424e6
Compare
f8424e6
to
fbe83d1
Compare
@piodul Do you know when this PR can be merged? |
@Jasperav it's under review right now, and it also needs a little cleanup wrt. commits and commit messages. Overall, next week is very likely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you choose to keep ConnectionKeeper
for managing control connections? It could be replaced with the new NodeConnectionPool
(and by replacing it, we wouldn't have to maintain two separate connection management implementations).
Except for the fact of not removing ConnectionKeeper
, this PR LGTM. Left some minor comments.
|
||
// If this fails try getting any other in random order | ||
let mut shards_to_try: Vec<u16> = | ||
(shard..nr_shards.get()).chain(0..shard).skip(1).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's clear, that .skip(1)
is used here to exclude shard
from the sequence. Have you considered using .filter()
? e.g. (0..nr_shards.get()).filter(|i| *i != shard)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using filter
has the following drawback:
- There will be an additional check to be performed for each shard ID
- The iterator won't be able to provide an accurate
size_hint
due to the use offilter
, socollect
won't be able to pre-allocate the right amount of memory for the Vec.
How about this instead (this is what I pushed in the updated version):
let mut shards_to_try: Vec<u16> = (0..shard).chain(shard + 1..nr_shards.get()).collect();
Initializing, | ||
|
||
// The pool is empty and is waiting to be refilled | ||
Pending, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending
state is caused by failing to create working connection, it is not a successor of Initializing
in a favorable conditions. I suggest changing name to one of Broken
, Failing
or adding a comment, that this state is undesirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Changed to Broken
fbe83d1
to
a0560cb
Compare
Shard count cannot be zero. Let's use Rust's standard NonZeroU16 type to represent that fact. While the usage of the ShardInfo::nr_shards field became a bit less ergonomic, we get rid of the assert in ShardInfo's constructor.
It wasn't used anywhere. It was just returning the `nr_shards` field which is public and can be accessed directly.
ShardInfo represents information related to sharding which was parsed from OPTIONS response sent by the database. It contains the id of the connection's shard, total number of shards and the msb_ignore parameter used for calculating shard based on token. ShardInfo has multiple methods used for calculations related to sharding, e.g. to calculate the shard for a token or iterate over source ports which can be used to connect to shard-aware port for a given shard. However, none of those methods use the first field of ShardInfo which represents the id of the connection's shard. If a method needs to calculate something for a particular shard id, the id is passed as an argument. This commit moves most of ShardInfo's methods to a new struct, Sharder. The new struct is identical to the first one, but does not have the "connection's shard id" parameter.
Adds a method which returns the shard-aware port reported by the connected node.
Adds a method to ConnectionConfig which returns if the configuration is for a SSL connection or not. This method works both with and without the "ssl" feature enabled.
A Scylla node can have two shard-aware ports enabled - one for unencrypted connections, and other for SSL connections. They are reported separately in SUPPORTED response under two keys: - SCYLLA_SHARD_AWARE_PORT - SCYLLA_SHARD_AWARE_PORT_SSL When establishing SSL connections, we didn't consider the second one at all, only looked at the first one. This commit fixes that, and now the `is_shard_aware` method correctly detects shard-aware SSL connections.
8bc64c3
to
4919730
Compare
@havaker In a previous version of this PR, the As a future improvement, I think that |
v2:
|
Adds NodeConnectionPool - an object which manages a pool of connections to a single node. It takes this responsibility over from Node and ConnectionKeeper - previously the Node kept a collection of ConnectionKeepers, and each keeper managed a connection to a single shard. The connection pool has the following advantages over the previous approach: - It is able to effectively use the regular port for connections to Scylla. Sometimes, the shard-aware port might not be available - not supported in an old version of Scylla, not exposed/enabled. It might also be hidden behind a source-port translating NAT which renders the shard choosing mechanism useless. A pool is a better abstraction for connecting to the regular port. When connecting to that port, Scylla assigns the shard for the connection and the driver has no control over the decision. The pool, when being refilled, tries to open as many connections as there are needed to fill the remaining shards and assigns resulting connections to the shards. The connection keeper manages only one connection to only one shard, and it won't be able to do it as effectively as the pool. It is worth mentioning that the pool _will_ use the shard-aware port if it is available. All things considered, the shard-aware port provides a superior method for establishing connections evenly distributed over all shards - the regular port should only be used if the shard-aware port cannot be used reliably. The pool is still able to use the shard-aware port as effectively as the ConnectionKeepers. - It detects the shard-aware port automatically. Currently, the Session probes a random node to see if the shard-aware port is reachable and, based on that check, either assumes that no nodes support it, or all nodes support it and have the same port number configured. The pool learns about the shard-aware port from the first connection and uses it for further connections. Because of this, the driver will now be able to use the new port in rolling upgrade scenarios. - It naturally support more than one connection per shard for Scylla (and more than one connection per node in case of Cassandra).
Exposes two new options in the configuration: - connection_pool_size - the target connection pool size, - disallow_shard_aware_port - setting it to true prevents the driver from using the shard-aware port at all. Both options can also be set in the SessionBuilder.
Because the NodeConnectionPool automatically detects the shard-aware port of the node, there is no need to probe it during creation of the Session. This commit removes the probe.
Removes the ConnectionKeeper struct. Its only user was the TopologyReader, and now the new NodeConnectionPool is used in its place.
4919730
to
427a935
Compare
v2.1:
|
I did a series of manual tests - I used an application which writes with a steady rate, and it calls Ran this scenario twice, with two pool sizes -
Ran this scenario with
Ran this scenario with
The application behaved in a sane way in all of the scenarios. |
I think it's not to hard to spin up additional nodes with Docker with Command within a Rust test. Maybe add a test which is mentioned above while fixing #307 |
Adding a simple distributed test in order to solve #307 is definitely a good idea, but anything more complex is very likely to burn our CI credits very quickly, so for now a set of manual tests is also acceptable. I'll proceed with merging and after that we can start planning a |
@psarna CI credit? Github Action minutes are completely free for public repositories: https://github.com/pricing |
Isn't there an upper bound of minutes to use monthly though? It would make sense for Github to place some sane limits in order to prevent abuse. If there isn't, then indeed we can add a bunch of distributed tests. |
I don't think so! Yes! Distributed tests are missing for this crate, that would make it a lot faster production ready :) |
so cool thank you :) |
Reworks how the driver manages connections per-node connections.
A new object,
NodeConnectionPool
is introduced which takes responsibility of managing connections fromNode
andConnectionKeeper
. The connection keeper was not removed as it is still used for managing the control connection.The new management scheme has following advantages over the previous approach:
TODO: I did some minimal testing (node restart, node add/remove) but should probably do some more (resharding, shard-aware port unreachable, shard-aware port is disabled/enabled after restart). For now, marking as draft.
Pre-review checklist
Fixes:
annotations to PR description.