-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce new deserialization framework upper layer abstractions #1093
Introduce new deserialization framework upper layer abstractions #1093
Conversation
See the following report for details: cargo semver-checks output
|
@@ -47,7 +47,7 @@ use std::{ | |||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit: iterator: adjust to the new deserialization framework
Unfortunately, due to the limitations of the Stream trait, currently the TypedRowIterator cannot deserialize types that need to borrow from the serialized frame contents.
I've played around with this, by removing RowT: 'static
bounds and trying to create a TypedRowStream
over some borrowed type (tried with &str
and my custom wrapper over &str
). I think that compiler error messages are not really insightful.. But this is only my opinion.
Funny thing is that compiler returns two different messages. For simply &str
, with the following snippet:
let typed_iter = raw_iter.into_typed::<(&str,)>().unwrap();
let mut typed_stream = typed_iter.into_stream();
let s = typed_stream.next().await;
what I get is:
error: implementation of `DeserializeRow` is not general enough
--> scylla/src/transport/connection.rs:1270:17
|
1270 | let s = typed_stream.next().await;
| ^^^^^^^^^^^^^^^^^^^ implementation of `DeserializeRow` is not general enough
|
= note: `(&str,)` must implement `DeserializeRow<'0>`, for any lifetime `'0`...
= note: ...but it actually implements `DeserializeRow<'1>`, for some specific lifetime `'1`
error: implementation of `DeserializeRow` is not general enough
--> scylla/src/transport/connection.rs:1270:37
|
1270 | let s = typed_stream.next().await;
| ^^^^^ implementation of `DeserializeRow` is not general enough
|
= note: `(&str,)` must implement `DeserializeRow<'0>`, for any lifetime `'0`...
= note: ...but it actually implements `DeserializeRow<'1>`, for some specific lifetime `'1`
, while for the snippet with my custom wrapper:
struct StrWrapper<'a> {
s: &'a str,
}
// implementations of DeserializeRow and DeserializeValue for StrWrapper
...
let typed_iter = raw_iter.into_typed::<(StrWrapper,)>().unwrap();
let mut typed_stream = typed_iter.into_stream();
let s = typed_stream.next().await;
I get:
error[E0599]: no method named `next` found for struct `TypedRowStream` in the current scope
--> scylla/src/transport/connection.rs:1270:30
|
1270 | let s = typed_stream.next().await;
| ^^^^ method not found in `TypedRowStream<(StrWrapper<'_>,)>`
|
::: scylla/src/transport/iterator.rs:1068:1
|
1068 | pub struct TypedRowStream<RowT> {
| ------------------------------- method `next` not found for this struct
|
= help: items from traits can only be used if the trait is implemented and in scope
= note: the following traits define an item `next`, perhaps you need to implement one of them:
candidate #1: `Iterator`
candidate #2: `StreamExt`
help: trait `StreamExt` which provides `next` is implemented but not in scope; perhaps you want to import it
|
1 + use futures::StreamExt;
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the second case (your wrapper), you must have forgotten to use futures::StreamExt as _
, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Previously, I did not include the warning about unused import which appeared as well.
error[E0599]: no method named `next` found for struct `TypedRowStream` in the current scope
--> scylla/src/transport/connection.rs:1238:20
|
1238 | let s = ts.next().await;
| ^^^^ method not found in `TypedRowStream<(StrWrapper<'_>,)>`
|
::: scylla/src/transport/iterator.rs:1065:1
|
1065 | pub struct TypedRowStream<RowT> {
| ---------------------------------------- method `next` not found for this struct
|
= help: items from traits can only be used if the trait is implemented and in scope
= note: the following traits define an item `next`, perhaps you need to implement one of them:
candidate #1: `Iterator`
candidate #2: `StreamExt`
help: trait `StreamExt` which provides `next` is implemented but not in scope; perhaps you want to import it
|
1 + use futures::StreamExt;
|
warning: unused import: `futures::StreamExt`
--> scylla/src/transport/connection.rs:1235:13
|
1235 | use futures::StreamExt as _;
| ^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
d25a3d8
to
2a76aa4
Compare
v1.1: addressed @muzarski comments - thanks for them! |
Marked as draft, because #1101 must be merged first, and then this PR will be adjusted to two lifetime parameters in the deserialization framework. |
2a76aa4
to
ea163d6
Compare
v1.2: Rebased on main. |
ea163d6
to
1e045c2
Compare
v2.0:
|
1e045c2
to
a2123cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm impressed - mostly some questions/nits
let rows = pages_received | ||
.rows | ||
.into_legacy_rows(PagingStateResponse::NoMorePages)?; | ||
let page_received = receiver.recv().await.unwrap()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea why this variable was previously called pages_received
(plural)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, no.
scylla/src/transport/query_result.rs
Outdated
/// A view over specification of a table in the database. | ||
#[derive(Debug, Clone, Copy)] | ||
#[cfg_attr(test, derive(PartialEq, Eq))] | ||
pub struct TableSpecView<'res> { | ||
table_name: &'res str, | ||
ks_name: &'res str, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the need for this and ColumnSpecView.
We will be able to modify TableSpec and ColumnSpec themselves if there is ever a need for it, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The need for this is precisely caused by your requirement about moving TableSpecs back into ColumnSpec.
If you remember, I suggested that we move TableSpec out of ColumnSpec (as in the current implementation of Scylla and Cassandra it's always the same for all columns - no support for joins). You agreed only on condition that we will be able to revert this without introducing breaking changes once Scylla or Cassandra introduces joins.
ColumnSpecView
will always borrow the keyspace name, no matter whether it's contained in ResultMetadata
directly or in a ColumnSpec
. Therefore, this abstraction layer defends us from the API breakage threat.
Conversely, if we returned ColumnSpec
to the user instead of ColumnSpecView
, then once we remove TableSpec
from ColumnSpec
, users will no longer be able to get table name and keyspace name from ColumnSpec
, so this breaks the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the benefit of removing TableSpec from ColumnSpec - it is borrowed anyway. Can we just keep it there and get rid of this additional layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is only borrowed (i.e., it contains Cow::Borrowed
variants) if the metadata itself is borrowed as a whole, which is not possible for e.g. RowIterator
(query_iter()
API) and for RowsDeserializer
in the owned variant. So, the additional layer saves possible allocations in such scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer true now that metadata is always borrowed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Apart from when being shared with PreparedStatement's cached metadata, result metadata is always deserialized in the borrowed form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remember that you mentioned that taking TableSpec out of ColumnSpec will also have positive impact on errors' size? I mean, there were more arguments to move TableSpec out of ColumnSpec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember that very well. In case of errors we probably still want to have the table name stored there - but I'm not sure. We can merge this PR as-is, and discuss later (but before release!) if it's worth it to introduce the Views.
I don't know what this is about. GitHub displays it as an alone message without reasonable context. |
That happens if, as a part of review, you respond in a discussion in some other review. In that case this comment is a response here: #1093 |
88447e8
to
a2123cf
Compare
The link points me to the PR as a whole, so again no luck in determining the context. |
a2123cf
to
d149059
Compare
v2.1: addressed another bunch of @muzarski comments. |
d149059
to
c1cfcdb
Compare
v2.2: rebased on main. |
1451865
to
2deb9ee
Compare
Soon, a new QueryResult will be introduced with a slightly different API. The old one will be preserved as LegacyQueryResult, in order to make migration easier. This commit renames the existing QueryResult to LegacyQueryResult, as well as the query_result module to legacy_query_result. The new QueryResult will be introduced in later commits.
IntoTypedRows and TypedRowIter, being helper items for LegacyQueryResult, are moved to legacy_query_result.rs from session.rs not to clutter session.rs. They are still re-exported in session.rs not to break compatibility.
This has the following consequences: 1) Legacy deserialization framework upper layer abstractions (LegacyQueryResult, LegacyRowIterator, etc.) will benefit from the lazy metadata deserialization, reducing number of allocations for metadata (but it's not going to have any impact, because the legacy deserialization framework allocates `Row`s anyway); 2) Legacy deserialization framework abstractions can no longer return `&[ColumnSpec<'static>]`, because the SelfBorrowedMetadataContainer contains borrowed `ColumnSpec`s; 3) In the next commits, conversion from the new abstractions to the legacy abstractions will be cheaper (wrt efficiency) and easier (wrt implementing it).
(Re)-introduces QueryResult. It is quite similar to the old (Legacy)QueryResult, but it keeps rows and metadata in an unparsed state (via Bytes) and has methods that allow parsing the contents using the new API. Helper method names are similar to what the old QueryResult had, just with the `_typed` suffix dropped - as now it is always required to provide the type of rows when parsing them, this suffix sounded redundant. There is one crucial change to the API. Motivation is as follows: `QueryResult` can represent a non-Rows response, so every rows-related operation on `QueryResult` may return "NonRowsResponse" error, which is inconvenient. The solution is to extract a new struct, `QueryRowsResult`. Thanks to that, one has to handle "NonRowsResponse" error only once, upon `QueryResult::into_rows_result()`. All further methods (`rows(), `single_row()`, etc.) may no longer fail with that error, which provides a clean step of conversion from any Result frame to Result:Rows frame. The drawback is that now there is a new call required in the call chain to deserialize a result, namely `.into_rows_result()`. Co-authored-by: Wojciech Przytuła <[email protected]>
New QueryResult tests are going to require result metadata serialization capabilities, as RawRows keep result metadata in a serialized form.
After the deserialization refactor with lazy result metadata deserialization, we will no longer have access to rows serialized size and rows count in Session and in RowIteratorWorker. Therefore, traces can no longer record that information. A further commit in this PR brings back serialized size to the span, but there is an important change: now the size is of both raw metadata and raw rows; before, metadata was not accounted.
Now, `result::deser_rows` returns RawRows instead of Rows, postponing any actual deserialization of response contents to a later time. The RawRows are pushed down the call stack, converted to the new QueryResult at some point and only converted to LegacyQueryResult where the API requires it. Co-authored-by: Wojciech Przytuła <[email protected]>
Like in the case of LegacyQueryResult, TypedRowIterator will be replaced with a new one, and the old one preserved for compatibility reasons. As a first step, it is renamed to LegacyTypedRowIterator. Similarly, RowIterator is renamed to LegacyRowIterator to mark that it is the part of the old API. There won't be a new RowIterator, though. Co-authored-by: Wojciech Przytuła <[email protected]>
In order to make further commits easier on the eyes, contents of the LegacyRowIterator::poll_next method are moved to the new poll_next_internal method. Co-authored-by: Wojciech Przytuła <[email protected]>
The iterator module has a small number of functions which return Poll<Option<Result<T, E>>>, call other functions that return such a stacked type and are support to continue if value obtained is not of form Ready(Some(Ok(x))). This requires a multiline pattern match instruction, which is basically boilerplate. This commit introduces the ready_some_ok! macro which hides the aforementioned boilerplate. Conceptually, it is similar to the existing std::task::ready! macro. Co-authored-by: Wojciech Przytuła <[email protected]>
This commit refactors the part responsible for acquiring the next page by the LegacyRowIterator to a different function. This change is a preparation necessary to support the new deserialization interface - there will be a method that can return deserialized type that borrows from the current iterator, and - for "lifetimes reasons" - acquiring the next page must be put into a separate method. Co-authored-by: Wojciech Przytuła <[email protected]>
Now, worker-related code comes strictly before iterator/stream-related code. This locality aid readability.
This commit makes the RowIteratorWorker pass raw rows to the main tokio task, instead of the eagerly deserialized Rows. The equivalent of the old RowIterator is now QueryPager. It cannot be conveniently iterated on, as it does not have any information about the column types. It features (yet not exposes) a `next()` method for deserializing consecutive `ColumnIterator`s. Users cannot manually perform deserialization using this method directly, because will be able to utilise the preferred (typed) API that will be added in the next commit. If they prefer manual deserialization, they will be be able to specify `ColumnIterator` as the deserialized row type, and proceed with manual deserialization from there. The legacy iterators are preserved by wrapping around QueryPager. Co-authored-by: Wojciech Przytuła <[email protected]>
Not to clutter iterator.rs with legacy abstractions, they are moved to a new module: `legacy`. I recommend viewing this commit **without** whitespace changes.
a47c6a4
to
6e4309e
Compare
I don't understand how this comment relates to the discussion that my comment was posted in. |
GitHub again displayed your comment as standalone, so I didn't know that it refers to any particular discussion 🙈 |
If you see that it is not possible to respond to a comment (there is no "Reply" text field) that means this comment was posted in some other thread which you need to find. |
This commit finishes the work related to adjusting the iterators module to the new deserialization framework. The previous commit brought RawRowsLendingStream, which can deserialize ColumnIterators. This commit introduces new TypedRowLendingStream, which type-checks once and then deserializes from ColumnIterators into rows. RawRowsLendingStream can be converted to TypedRowLendingStream by calling the `into_typed()` method. Unfortunately, due to the limitations of the Stream trait (no support for lending streams, analogous to lending iterators in case of RawRowsLendingIterator), a Stream cannot be used to deserialize borrowed types (i.e. those that borrow from the frame serialized contents). In order to give users both capabilities: 1) deserializing borrowed types (for efficiency), 2) deserializing using Stream (for convienience), two distinct types are used: TypedRowLendingStream and TypedRowStream. The first supports borrowed types and the second implements Stream. To sum up, instead of `RowIterator` (returning `Row`s) and `TypedRowIterator` (returning instances of the target type) both implementing `Stream`, now we have the following: - `RawRowsLendingStream` - cannot implement `Stream`, because returns `ColumnIterator`s that borrow from it, - provide `type_check()` and `next()` methods that can be used for low-level, manual deserialization (not recommended for ordinary users) - supports deserializing manually borrowed types (such as `&str`). - `TypedRowLendingStream` - created by calling `into_typed::<TargetType>()` on `RawIterator`, - type checks upon creation, - supports deserializing borrowed types (such as `&str`), - does not implement `Stream` in order to support borrowed types, - provides basic Stream-like methods (`next()`, `try_next()`), - `TypedRowStream` - created by calling `into_stream()` on `TypedRowLendingStream`, - implements `Stream` and hence does not support borrowed types. Co-authored-by: Piotr Dulikowski <[email protected]>
It is no longer needed. For compatibility with LegacyQueryResult and LegacyRowIterator, higher-layer conversions suffice.
Even though we can no longer record rows serialized size without accounting metadata, we can record their serialized size together.
6e4309e
to
b0fabe5
Compare
v3.2.1: exposed |
Contents
This PR:
Introduces
ResultMetadataHolder
, a versatile container forResultMetadata
, for use inQueryResult
etc.Introduces new lifetime-generic containers for Result:Rows response with varying degree of serialization/deserialization of metadata and rows:
RawMetadataAndRawRows
,DeserializedMetadataAndRawRows
,whose goal is to allow lazy deserialization of rows and deserializing metadata in the borrowed form (to reduce number of string allocations).
Introduces
RawRowsLendingIterator
(see description below).Introduces two main abstractions belonging to the upper layer of the new deserialization framework:
QueryResult
,TypedRowStream
,and prefixes their legacy counterparts with
Legacy_
.QueryResult
andTypedRowStream
down the internal code, converting them to legacy counterparts only when needed by the legacySession
's API.Details
RawRowsLendingIterator
It is added in tandem with the already present (the low level, scylla_cql one, not the
iterator.rs
one!)RowIterator
. The difference is thatRawRowsLendingIterator
owns the frame and the metadata, whereasRowIterator
borrows it. The lending one is needed in session iterator interface, and the borrowing one is used withQueryResult
.Achieving old behaviour for dynamic deserialization purposes
If one still wants to deserialize arbitrary rows (as the old framework would do) and only then try to convert them dynamically to some proper types, this is doable by using the old
Row
struct as the target type, e.g.ColumnSpecs
The new
ColumnSpecs
struct is introduced that provides view over columns specification, but exposes onlyTableSpecView
andColumnSpecView
structs instead ofTableSpec
andColumnSpec
, allowing us to modify the latter in the future without introducing breaking changes. One planned change is movingtable_spec
out ofcolumn_spec
for reduced memory footprint (see #956), but as CQL protocol allows different table specs for different columns and we can't predict whether joins won't be introduced at some point in the future, we retain this flexibility by imposing a new layer of abstraction (the views).New paging iterators
Instead of
RowIterator
(returningRow
s) andTypedRowIterator
(returning instances of the target type) both implementingStream
, now we have the following:RawLendingStream
Stream
, because returnsColumnIterator
s that borrow from it,type_check()
andnext()
methods that can be used for low-level, manual deserialization (not recommended for ordinary users)&str
).TypedRowLendingStream
into_typed::<TargetType>()
onRawLendingStream
,&str
),Stream
in order to support borrowed types.TypedRowStream
into_stream()
onTypedRowLendingStream
,Stream
and hence does not support borrowed types.TODO
Session'
s deserialization API and migrating internals APIs, tests and examples to it.Another step towards: #462
Pre-review checklist
./docs/source/
.Fixes:
annotations to PR description.