-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch Session to new serialization traits #858
Conversation
5c2657e
to
ef0cb49
Compare
ef0cb49
to
6f14f8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last commit is quite large and hard to review. I think the changes there should be done in a more gradual fashion, utilizing the into_old_serialized_values
method more.
Also, the improvements in the proxy related to the detection of control connection could be sent in a separate PR in order to merge them more quickly. |
Opened #863 |
Ooops, misslick |
6f14f8c
to
0adab94
Compare
scylla/src/transport/connection.rs
Outdated
@@ -4,6 +4,7 @@ use scylla_cql::errors::TranslationError; | |||
use scylla_cql::frame::request::options::Options; | |||
use scylla_cql::frame::response::Error; | |||
use scylla_cql::frame::types::SerialConsistency; | |||
use scylla_cql::frame::value::SerializedValues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
connection.rs: Don't allow queries with values
After serialization refactor it will be impossible to perform unprepared
query with values, because serializing values will require knowing
column types.In order to make refactor easier and better split responsibility, this
commit removesvalues
arguments fromConnection
methods, so that
it is callers responsibility to prepare the query if necessary.
I'm not sure if I see why it makes the remaining refactor easier. It looks like you are simplifying the API of Connection
but this is just pushing this concern to the caller. This requires modifying lots of places in the code, including tests and query_filter_keyspace_name
which manually checks whether values is empty and dispatches to query
or execute
anyway.
I wonder whether it wouldn't be just simpler to drop this commit and then rework the last one, but I may be missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking is that we don't want to send values in unprepared queries, and removing such capability from Connection
is a way to guarantee that. If the only users of this functionality are tests and query_filter_keyspace_name
, then I think it's not worth it to have this functionality.
I'll look into this matter again - I remember that some changes were less pleasant than I hoped, maybe reverting those changes in connections would help with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking is that we don't want to send values in unprepared queries, and removing such capability from
Connection
is a way to guarantee that.
What do you mean exactly by "send values in unprepared queries"? We are no longer doing that on the protocol level after #806 and #812 . If you mean to disallow calling .query()
with values, then I'm not sure if it makes sense to disallow it for Connection
but not for Session
(and AFAIK we didn't decide to do it for Session
at this point).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this commit opens us up to the problems like following:
topology::query_filter_keyspace_name
calls Connection::query_iter
, potentially with values.
Connection::query_iter
will call RowIterator::new_for_connection_query_iter
, which will,
for each page, call Connection::query_with_consistency
- which, if there are values,
prepares the query underneath to serialize them.
Such problems seem pretty hard to notice normally but removing values
from Connection pretty much prevents them.
Currently, I tried to revert removal of values
from Connection
and from RowIterator::new_for_query
, as it forced me to prepare a query globally in Session::query_iter
if there are values - now it is again prepared for each sub query.
Doing this created another problem I'm not yet sure how to solve: - if RowIterator::new_for_query
accepts values
, how do we get a size of it for request span? In either variant of Connection
API, new_for_query
won't serialize the values, Connection
will do it. There is no place where RowIterator
sees serialized values, so it can't record it's size for span.
For a user of Connection
struct (e.g. struct Session
), API without values
args covers all the use cases that the previous one did - perhaps with a bit more code sometimes, as the caller needs to prepare the query explicitly if they want to send values.
One advantage I see of removing values
from Connection
is that it makes the API more explicit - additional overhead of preparing queries is not hidden underneath Connection::query*
methods, so we are more aware of it while writing other code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this commit opens us up to the problems like following:
topology::query_filter_keyspace_name
callsConnection::query_iter
, potentially with values.Connection::query_iter
will callRowIterator::new_for_connection_query_iter
, which will, for each page, callConnection::query_with_consistency
- which, if there are values, prepares the query underneath to serialize them. Such problems seem pretty hard to notice normally but removingvalues
from Connection pretty much prevents them.
This issue is easy to fix - prepare the query if needed in Connection::query_iter
and then change RowIterator::new_connection_query_iter
to be RowIterator::new_connection_execute_iter
.
Currently, I tried to revert removal of
values
fromConnection
and fromRowIterator::new_for_query
, as it forced me to prepare a query globally inSession::query_iter
if there are values - now it is again prepared for each sub query.
This sounds like a harder problem because, unlike Connection::query_iter
, the iterator might switch to another connection when performing a query fails. Perhaps we could be smarter here and prepare the query when RowIteratorWorker::work
chooses a connection, but doing that sounds like too much work to be worth it, so your current approach looks OK for this one.
Doing this created another problem I'm not yet sure how to solve: - if
RowIterator::new_for_query
acceptsvalues
, how do we get a size of it for request span? In either variant ofConnection
API,new_for_query
won't serialize the values,Connection
will do it. There is no place whereRowIterator
sees serialized values, so it can't record it's size for span.
This problem should go away if you apply my first suggestion.
For a user of
Connection
struct (e.g. structSession
), API withoutvalues
args covers all the use cases that the previous one did - perhaps with a bit more code sometimes, as the caller needs to prepare the query explicitly if they want to send values.One advantage I see of removing
values
fromConnection
is that it makes the API more explicit - additional overhead of preparing queries is not hidden underneathConnection::query*
methods, so we are more aware of it while writing other code.
You can look at it from another angle - having to do manual prepare in metadata fetch code sounds like polluting it with unnecessary details which should be handled on a lower level.
The examples with query_iter
convince me that it might be a good idea after all to automatically prepare on the highest layer instead on the lowest, because otherwise unnecessary preparing may occur. Though I'd prefer that query_iter
API did prepare automatically like I described (I see it as a user API in some sense, where the control connection is the user), there is an issue about changing internal CQL statements to use prepared statements (#417) so I suppose choosing to use Connection::{query_iter,execute_iter}
explicitly is acceptable for now.
To sum up - let's not pursue my original suggestion from my first comment in this thread, I don't think it's worth it anymore.
0adab94
to
b3a9d69
Compare
|
Please send this commit as a separate PR. It will introduce lots of conflicts with my other PRs, so let's merge this first so that I can rebase on top of it. |
This commit also adjusts `test_unusual_valuelists` in session_test.rs, because it used `&dyn Value` type explicitly.
We use both APIs so that the code works before and after switching `Session::query` from accepting `ValueList` to `SerializeRow`. We can remove usage of old APIs after the change to `Session::query`.
We use both APIs so that the code works before and after switching `Session::query` from accepting `ValueList` to `SerializeRow`. We can remove usage of old APIs after the change to `Session::query`.
We use both APIs so that the code works before and after switching `Session::query` from accepting `ValueList` to `SerializeRow`. We can remove usage of old APIs after the change to `Session::query`.
We use both APIs so that the code works before and after switching `Session::query` from accepting `ValueList` to `SerializeRow`. We can remove usage of old APIs after the change to `Session::query`.
This is in preparation to switch `Connection::execute_iter` to new serialization APIs.
…ew` methods to new serialization API This commit switches all those methods at once, because switching them one by one is too problematic - lifetime issue of PrimaryKey and temporary LegacySerializedValues.
There are fallback implementations of SerializeRow for LegacySerializedValues provided to help users transition to new traits. This commit updates most of the code that uses those implementations, so that removing them in the future is easier. Uses that are not removed: - Session::batch: will be removed when batches are switched to new API - value_tests::map_value_list: will be removed while removing LegacySerializedValues - serialize/row.rs tests: will be removed while removing LegacySerializedValues
During the switch to new serialization API `#[allow(dead_code)]` was added in several places as a temporary measure. Those can now be removed.
bd5a6c1
to
64c6ac6
Compare
Addressed review comments, fixed a book build errors (in a minimal way - I only introduced changes that were strictly necessary, I don't want to expand scope of this PR) |
This PR is based on #855 and is the main part of serialization refactor - switching public API to new trait.
The last commit makes the actual change. Previous ones are different changes that were necessary (or helpful) to make the switch.
There are still some things to do:
But the code changes are pretty much ready for review.
Pre-review checklist
./docs/source/
.Fixes:
annotations to PR description.