Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization API rework #665

Closed
wants to merge 33 commits into from

Conversation

piodul
Copy link
Collaborator

@piodul piodul commented Mar 17, 2023

This PR reworks the deserialization API. The new API solves some performance problems that the old one had and makes it possible to improve type safety. As this change will probably affect lots of existing code, the old API is still available but marked as "Legacy" - it's opt-in but only some small changes will be required to use it.

What are the problems with the current API?

Currently, when the driver receives response from the database, it immediately deserializes it to Vec<Row>, where Row = Vec<CqlValue> and CqlValue is an enum that is able to represent any CQL type. Then, the user can choose to iterate over this representation, converting rows with the FromRow trait. When a row is converted, it usually uses the FromCqlVal trait to convert individual column values. This design has the following problems:

  • The representation with rows and vecs is very inefficient. It incurs #rows + 1 allocations just for Vecs only. Moreover, some column types will allocate too - e.g. text (represented by a String), blob (represented with Vec<u8>), as well as collections and UDTs. Many of those allocations could be perfectly avoided by deserializing data on demand: non-allocating alternatives such as &[u8] can be used to represent some types, collections can be consumed via iterators, UDTs can be deserialized directly to structs with the help of derive macros, and the need to allocate and deserialize everything up front can be avoided by iterating over the rows and deserializing them lazily.
  • FromCqlVal and FromRow receive incomplete information about the rows and columns. For example, CqlValue::List variant keeps its values in a Vec, so it can't validate the type if the vec is empty. A more issue with FromRow is that it doesn't know about the names of the columns, so the FromRow derive macro can't validate that the column types match, nor it can support deserializing structs where fields are in a different order than columns.

Those issues couldn't really be fixed without the API redesign, hence that's why this PR have been created.

The new API

The new API is based on two new traits, DeserializeCql and DeserializeRow:

pub trait DeserializeCql<'frame>
where
    Self: Sized,
{
    /// Checks whether this type can be deserialized from given CQL type.
    fn type_check(typ: &ColumnType) -> Result<(), ParseError>;

    /// Deserialize from given bytes.
    /// The function may assume that `type_check` was called and it succeeded.
    fn deserialize(
        typ: &'frame ColumnType,
        v: Option<FrameSlice<'frame>>,
    ) -> Result<Self, ParseError>;
}
pub trait DeserializeRow<'frame>
where
    Self: Sized,
{
    /// Checks whether this type can be deserialized from given row.
    fn type_check(specs: &[ColumnSpec]) -> Result<(), ParseError>;

    /// Deserialize from given bytes.
    /// The function may assume that `type_check` was called and it succeeded.
    fn deserialize(row: ColumnIterator<'frame>) -> Result<Self, ParseError>;
}

Instead of receiving pre-parsed data, both traits receive objects that borrow data from the serialized frame - hence the 'frame lifetime parameter. This allows the types being deserialized to allocate only when they need to (e.g. in case of String). However, the new API also allows types that borrow data from the serialized frame - for example &[u8] or &str. There is also a third option - the type being deserialized can also share ownership of the serialized frame - for example, you can deserialize blob to a bytes::Bytes, which will keep the original, serialized frame alive while it is alive. This is easier to use than slices or strs, but one needs to be careful as even a tiny blob will keep the memory for the whole serialized frame alive, which can lead to space leaks.

Another notable changes is the type_check method. In case of DeserializeRow it receives more comprehensive information than FromRow, which include column names, and now maching column names to fields is possible. The derive for DeserializeRow has that functionality. Moreover, because type checking is separate from deserialize, it can be done only once when a response is received - if the page with response has 1000 rows then we only need to call type_check once per page but deserialize once per row. deserialize can assume that type_check has been called and it can avoid some potentially expensive type checks.

Migration from the legacy API

Apart from the use of different traits, the existing API is not that different. Some method names in QueryResult and RowIterator (now RawIterator) have been changed and some fields have been hidden, but otherwise changes that need to be made are frequently mechanical - the commits that adjust the examples and tests are an example of that. Nonetheless, the old API have been existing for a long time, so requiring folks to switch everything at once wouldn't be right in my opinion. This PR also made sure to preserve the old API - while it's opt-in it's very easy to adjust the existing code to use it - just do use scylla::LegacySession as Session and similarly for other types that have been marked as "legacy". There are also some facilities that make it possible to migrate the code piecewise - for example, you can create a new API Session from a LegacySession which, although has a different API, shares the same resources. There will also be a migration guide include (although it's still a TODO at this point, one of the reasons that it is a draft PR).

TODOs

  • Finish docstrings for the most important traits/types that were introduced
  • Update documentation and write a migration guide

Fixes: #462

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I added appropriate Fixes: annotations to PR description.

@wprzytula
Copy link
Collaborator

wprzytula commented Mar 20, 2023

There is a typo in scylla-cql/types: introduce new deserialization traits commit description: DeserializeCql<'s> instead of DeserializeCql<'f>.
In the same description, this sentence ends suddenly: Rust's lifetime system makes sure that the the user doesn't deallocate the serialized response until using the .

@piodul piodul force-pushed the 462-new-deserialization-api branch from 00e71d8 to b3d824f Compare March 23, 2023 11:43
@piodul
Copy link
Collaborator Author

piodul commented Mar 23, 2023

v0.1: rebased and fixed clippy's complaints

@piodul piodul force-pushed the 462-new-deserialization-api branch 2 times, most recently from 05a4ba8 to 86ec047 Compare March 30, 2023 08:42
@piodul
Copy link
Collaborator Author

piodul commented Mar 30, 2023

v1:

  • Renamed LegacyXYZ stuff to Legacy08XYZ - if we have more legacy things in the future then it will be good to differentiate them somehow,
  • Adjusted existing documentation to the new API - the code examples now compile,
  • Added migration guide from 0.8 deserialization API,
  • Added documentation for the new derive macros,
  • Added the no_field_name_verification annotation for the derive macros,
  • Added docstrings to some of the missing items, slightly moved around some code,
  • Rebased on newest main.

@piodul
Copy link
Collaborator Author

piodul commented Mar 30, 2023

There is a small number of TODOs that should be fixed in near future, but I think that even without them the PR is good to go now (feel free to disagree). Moving out of draft.

@piodul piodul marked this pull request as ready for review March 30, 2023 09:00
@piodul piodul changed the title [WIP] Deserialization API rework Deserialization API rework Mar 30, 2023
@piodul piodul requested review from wprzytula and cvybhu March 30, 2023 09:00
Copy link
Collaborator

@wprzytula wprzytula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far perused only 4 commits, but thanks to their volume it's enough for a firm burst of comments. I'll continue review in the next days.

scylla-cql/src/types/deserialize/mod.rs Outdated Show resolved Hide resolved
scylla/src/transport/legacy_query_result.rs Show resolved Hide resolved
scylla-cql/Cargo.toml Show resolved Hide resolved
scylla-cql/src/types/deserialize/mod.rs Show resolved Hide resolved
scylla-cql/src/types/deserialize/mod.rs Outdated Show resolved Hide resolved
scylla-cql/src/types/deserialize/value.rs Outdated Show resolved Hide resolved
scylla-cql/src/types/deserialize/value.rs Outdated Show resolved Hide resolved
Comment on lines +838 to +850
impl<'frame> Iterator for UdtIterator<'frame> {
type Item = Result<
(
&'frame (String, ColumnType),
Option<Option<FrameSlice<'frame>>>,
),
ParseError,
>;

fn next(&mut self) -> Option<Self::Item> {
// TODO: Should we fail when there are too many fields?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this. Should we?

If we got more fields than we expected for a UDT, why should we let it silently pass? IMO it deserves at least a warning. Failure is appropriate as well, as I can't find any counterexample.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a reason to allow less fields in the serialized form than there are in the type definition.
UDTs allow to perform the ALTER TYPE operation, which allows to add a new field to the type.
From that point on all values of this UDT will have a value for the new field, but AFAIK we don't update the old values that were already stored in the table, that would require a costly full table scan. Instead when reading a value with less fields than defined in the type we just assume that the value for those fields is null.

Copy link
Collaborator Author

@piodul piodul May 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protocol spec only allows less fields to be sent, and the description implies that it's incorrect to send more:

  A UDT value is composed of successive [bytes] values, one for each field of the UDT
  value (in the order defined by the type). A UDT value will generally have one value
  for each field of the type it represents, but it is allowed to have less values than
  the type has fields.

The TODO is mostly about considering how to handle a case where the database sends a response that does not conform to the spec. If we don't trust the DB then this method will have to be written a bit differently.

scylla-cql/src/types/deserialize/value.rs Outdated Show resolved Hide resolved
scylla-cql/src/types/deserialize/value.rs Outdated Show resolved Hide resolved
@wprzytula wprzytula self-requested a review March 31, 2023 08:54
@piodul piodul added this to the 1.0.0 milestone Apr 6, 2023
@piodul
Copy link
Collaborator Author

piodul commented Apr 14, 2023

@cvybhu @wprzytula review ping

Comment on lines +381 to +380
// TODO: We could provide ResultMetadata in a similar, lazily
// deserialized form - now it can be a source of allocations
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we plan to take care of this TODO?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not do it in this PR. It's quite complicated already. I'll create an issue after this is merged.

scylla-cql/src/frame/response/result.rs Show resolved Hide resolved
scylla-cql/src/frame/response/result.rs Show resolved Hide resolved
scylla-cql/src/frame/response/result.rs Show resolved Hide resolved
scylla-cql/src/frame/response/result.rs Outdated Show resolved Hide resolved
scylla-cql/src/macros.rs Show resolved Hide resolved
scylla-cql/src/macros.rs Show resolved Hide resolved
scylla-macros/Cargo.toml Outdated Show resolved Hide resolved
scylla-macros/src/lib.rs Show resolved Hide resolved
scylla-macros/src/deserialize/mod.rs Show resolved Hide resolved
@wprzytula wprzytula self-requested a review April 19, 2023 09:53
@piodul piodul force-pushed the 462-new-deserialization-api branch from 86ec047 to c25d5ad Compare April 21, 2023 14:56
@piodul
Copy link
Collaborator Author

piodul commented Apr 21, 2023

v1.1: rebased on newest main (no other changes)

Copy link
Contributor

@cvybhu cvybhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice, I got through some of it, will continue later.

scylla-cql/src/types/deserialize/mod.rs Outdated Show resolved Hide resolved
Ok(MaybeEmpty::Value(v))
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some types can't be empty, but MaybeEmpty doesn't forbid crating a MaybeEmpty with them. For example it doesn't make sense to create a MaybeEmpty<String>. In case of a an empty string the parsed value should be MaybeEmpty::Value("".to_string()), but instead it will be MaybeEmpty::Empty.
Perhaps we could introduce a trait: EmptyableCqlType and have MaybeEmpty<T: EmptyableCqlType>?
This could protect against such mistakes at compile time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I remember that I considered it when implementing MaybeEmpty. I think the reason for why I dropped was that it's easy to forget to add EmptyableCqlType when adding a new impl DeserializeCql. Besides, while MaybeEmpty<String> doesn't make much sense from the CQL protocol point of view, it does work and it should be pretty clear to the user what it does.

However, now that I think of it, it isn't a huge problem and adding missing impl EmptyableCqlType is easy and doesn't break the API. Moreover, unless users are well-versed with the CQL protocol spec (I wouldn't expect or require them to be), they won't know which types can be empty, so this might serve an informative purpose. In the future, if we rework the serialization API, then we'd rather use MaybeEmpty there, too - and the EmptyableCqlType will be even more important for the purpose of serialization.

To sum up - I'm convinced, I'll add a trait.

Comment on lines +838 to +850
impl<'frame> Iterator for UdtIterator<'frame> {
type Item = Result<
(
&'frame (String, ColumnType),
Option<Option<FrameSlice<'frame>>>,
),
ParseError,
>;

fn next(&mut self) -> Option<Self::Item> {
// TODO: Should we fail when there are too many fields?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a reason to allow less fields in the serialized form than there are in the type definition.
UDTs allow to perform the ALTER TYPE operation, which allows to add a new field to the type.
From that point on all values of this UDT will have a value for the new field, but AFAIK we don't update the old values that were already stored in the table, that would require a costly full table scan. Instead when reading a value with less fields than defined in the type we just assume that the value for those fields is null.

scylla-cql/src/frame/response/result.rs Show resolved Hide resolved
scylla-cql/src/macros.rs Show resolved Hide resolved
"Value is out of representable range for NaiveDate".to_string(),
)
})
}
Copy link
Contributor

@cvybhu cvybhu Apr 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code for NaiveDate contained comments as to what's happening and why the - (1i64 << 31) doesn't panic. I think it would be good to keep them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it obvious in the new code? The old code operated on i64s implicitly, i.e. it subtracted two chrono::Duration values. In the new code, we are operating directly on i64s.

scylla-cql/src/types/deserialize/value.rs Show resolved Hide resolved
scylla-cql/src/types/deserialize/value.rs Show resolved Hide resolved
scylla-cql/src/types/deserialize/value.rs Show resolved Hide resolved
scylla-cql/src/types/deserialize/value.rs Show resolved Hide resolved
impl QueryResult {
/// Returns the number of received rows, or `None` if the response wasn't of Rows type.
pub fn rows_num(&self) -> Option<usize> {
Some(self.raw_rows.as_ref()?.rows_count())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, for me the current version is much easier to read. The more functional style requires more brainpower to parse, taking something and mapping it using a member function is a bit unnatural. OTOH having Some() with some code inside is a standard thing that's easy to parse.

Comment on lines +50 to +71
match &self.raw_rows {
Some(rows) => Ok(Some(rows.rows_iter()?)),
None => Ok(None),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think match is more readable than chaining functional combinators. Everyone knows what match does, and the current version does everything neatly step-by-step.
Functional combinators require the reader to have them freshly in their head. With imperative code you can understand what's going on just by looking at the sequence of instructions, but with functional stuff the reader might have to google how and_then differs from map, etc... I think the code should be as accessible as possible, even for people who don't write Rust all the time.

scylla/src/transport/query_result.rs Outdated Show resolved Hide resolved
Comment on lines +90 to +92
/// If the UDT definition does not contain this field, it will be initialized
/// with `Default::default()`. __This attribute has no effect in `enforce_order`
/// mode.__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make it work with enforce_order?

I imagine that people who use enforce_order for the extra performance boost might need to use it too.
AFAIU the main use of default_when_missing is gracefully handling the ADD FIELD operation on UDTs.
When someone wants to add the field while the app is running they should:

  1. Add a new field to the rust struct, mark it as default_when_missing
  2. Rollout the new updated version of db client
  3. Do the ADD FIELD operation in the database

In such situation it's crucial to be able to use default_when_missing . We shouldn't penalize people who use straight order enforcing by making it impossible for them to handle such changes gracefully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if this combination isn't supported I think trying to do both of them at once should generate a compilation error instead of silently ignoring default_when_missing. Ignoring it might lead to some nasty surprises for the users.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we can support enforce_order + default_when_missing. If a struct field doesn't match the column name, we will initialize that field to Default::default() and will try to initialize the next field with that column.

scylla-cql/src/macros.rs Show resolved Hide resolved
scylla-macros/src/deserialize/cql.rs Show resolved Hide resolved
scylla-macros/src/deserialize/row.rs Show resolved Hide resolved
scylla-macros/src/deserialize/row.rs Show resolved Hide resolved
scylla-macros/src/deserialize/row.rs Show resolved Hide resolved
scylla-macros/src/deserialize/row.rs Show resolved Hide resolved
v: Option<FrameSlice<'frame>>,
) -> Result<Self, ParseError>;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried that the user has to remember to call type_check before doing deserialize. As the code gets more complex, with all the iterators and whatnot it's hard to prove to myself that type_check was indeed called.

I wonder if it would be possible to use the type system to make such mistakes impossible. Something like TypeChecked<T>() or something.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is a problem for the users at all. We, as the developers, need to provide abstractions that always call type_check before deserialize and the users will be fine - and this PR provides them. Apart from implementing DeserializeCql/DeserializeRow themselves, I don't think there will ever be a need for the users to call type_check or deserialize directly.

@piodul piodul force-pushed the 462-new-deserialization-api branch from c25d5ad to b6a2495 Compare April 28, 2023 10:35
@piodul
Copy link
Collaborator Author

piodul commented Apr 28, 2023

v1.2: rebased, fixed one test that used to fail in serverless mode

@piodul piodul force-pushed the 462-new-deserialization-api branch 2 times, most recently from 46e35a3 to 52941ee Compare April 28, 2023 10:57
piodul added 17 commits May 9, 2023 16:36
Both Session and Legacy08Session will support methods that allow
sending queries/prepared statements/batches and will share most of the
implementation - it's just that return types will be slightly different.

This commit moves the core of those methods to private methods `do_xyz`
for every `xyz` method from the API. This will allow to implement the
public methods for both API kinds with minimal boilerplate.
Adds Session as an alias over GenericSession<CurrentDeserializationApi>.
No methods (apart from the common ones) are added to it yet.
This commit renames the SessionBuilder::build method to build_legacy,
and then reintroduces the build method so that it returns the new
Session (not Legacy08Session).

All the examples, tests, documentation will gradually be migrated to
use SessionBuilder::build again in following commits.
The query/execute/batch statements are generic over the type of the
values that are meant to be serialized and sent to the cluster. They
started by converting the values to SerializedValues and then continued
without the need for generics. However, those functions used to be
non-trivial and would have to be monomorphised for every type of the
arguments passed to the method, increasing compilation time more than
necessary.

Now that most of the implementation was moved to do_query etc. methods,
we can restrict the generic part to the public query/execute/batch
methods which convert the input values to SerializedValues and then call
the non-generic do_query etc. methods. This commit does just that -
de-genericises do_query and friends, while leaving query and friends
generic as they used to.
QueryResult can be converted to Legacy08QueryResult, but not the other
way around. In order to support both APIs, internal methods (do_query,
do_execute, etc.) need to be changed so that they return the new
QueryResult.
Implements methods related to sending queries for the new Session.
Adjusts the methods of Connection, apart from query_iter, to use the new
deserialization API. Connection is meant to be an internal API, so we
don't introduce a Legacy08Connection for this.
In a similar fashion to Session, CachingSession was also made generic
over the session kind.
Adjusts the CachingSession tests to use the new deserialization
interface.
The Connection::query_iter method is changed to use the new
deserialization framework. All the internal uses of it in topology.rs
are adjusted.
…acing info

Adjusts the Session::try_getting_tracing_info method to use the new
deserialization framework.
This is a large commit which goes over all existing tests that haven't
been migrated in previous commits and adjusts them to use the new
deserialization framework.

There were lots of changes to be made, but they are mostly independent
from each other and very simple.
This commit goes over all unadjusted examples and changes them to use
the new deserialization framework. Again, it contains a lot of changes,
but they are quite simple.
The Cluster struct by itself only serves as a facade for the
ClusterWorker, i.e. it has channels that allow sending requests to the
worker, receives the ClusterData via the `data` field etc. Apart from
the `_worker_handle` field, all other fields are cloneable. Two tasks
working on two copies of the same Cluster object should behave the same
as if they shared and operated on a single Cluster object (e.g. via
Arc<Cluster>).

This commit makes the Cluster object cloneable - the `_worker_handle` is
shared via an Arc. This will be very useful in the next commit - we will
do a similar thing for the GenericSession object.
In order to make migration from the old API easier and allow doing it
gradually, some components of the client programs would probably like to
use the old API while the new components will use the new API. However,
in the current design, Session and Legacy08Session are two different
types and it's not possible to "cast" one to another - even though they
have nearly the same fields and implementations.

The previous commit made Cluster cloneable, based on the observation
that it's perfectly fine to clone Cluster's fields, construct a new one
and treat it as a shared facade, handle to the same "semantic" cluster.
The same applies to Session, actually - cloning a session would have
similar effect (though we encourage users to keep Session in an Arc so
that cloning is cheaper).

Instead of making GenericSession cloneable, we introduce methods which,
in reality, perform a clone but change the kind of session's API. This
allows to have two session objects which share the same resources but
have different APIs. This should be very useful when migrating large
projects to the new API - components that need to use the new API can
just "convert" the session to the new interface and use that.
Examples in the documentation are now adjusted to the new
deserialization API, and the examples there finally compile.
Adds a document that should help adjust users to the new deserialization
API.
@piodul piodul force-pushed the 462-new-deserialization-api branch from b6e9507 to 46b6e4d Compare May 9, 2023 14:37
@piodul
Copy link
Collaborator Author

piodul commented May 9, 2023

v2.1: rebased on main

Copy link
Collaborator

@wprzytula wprzytula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the commit introducing new procedural macros, I've completed the review.

scylla-macros/src/deserialize/row.rs Show resolved Hide resolved
scylla-cql/src/macros.rs Show resolved Hide resolved
scylla-cql/src/types/deserialize/row.rs Show resolved Hide resolved
Comment on lines +1240 to +1243
// UDF fields in correct same order
let mut udt_contents = BytesMut::new();
append_cell(&mut udt_contents, "The quick brown fox".as_bytes());
append_cell(&mut udt_contents, &42i32.to_be_bytes());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean: UDT fields?

scylla-cql/src/types/deserialize/value.rs Show resolved Hide resolved
docs/source/queries/result.md Show resolved Hide resolved
While it's possible to implement those manually, the driver provides procedural macros for automatic derivation in some cases:

- `FromRow` - implements `FromRow` for a struct.
- `FromUserType` - generated an implementation of `FromCqlVal` for the struct, trying to parse the CQL value as a UDT.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/generated/generates

Comment on lines +47 to +48
Note: the macros above have a default behavior that is different than what `FromRow` and `FromUserTypes` do.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unclear.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is their default behaviour and what are the differences? Also, as the names here are similar both for macros and traits, it is crucial to distinguish them with some strong markers.

Comment on lines +51 to +52
The new API introduce two analogous traits that, instead of consuming pre-parsed `Vec<Option<CqlValue>>`, are given raw, serialized data with full information about its type. This leads to better performance and allows for better type safety.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of information about the serialized data is given to the new API? Provide examples.

}
```

The above traits have been implemented for the same set of types as `FromRow` and `FromCqlVal`, respectively. Notably, `DeserializeRow` is implemented for `Row`, and `DeserializeCql` is implemented for `CqlValue`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be helpful to remind that Row and CqlValue are representatives of the legacy API.

Copy link
Collaborator

@wprzytula wprzytula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new derive macros' code looks promising. I've also examined the generated code using cargo expand and it looks valid, too. Only minor nits there.

scylla-macros/src/deserialize/mod.rs Show resolved Hide resolved
Comment on lines +464 to +465
// TODO: Allow collecting unrecognized fields into some special field

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of TODO is this one? An ad-kalendas-Graecas one, or something to be done in a follow-up?

@piodul
Copy link
Collaborator Author

piodul commented Jul 27, 2023

I'll split this PR into multiple smaller ones. It's quite hard to keep going back to a single huge PR, rebasing it and addressing lots of comments at once, so it should make the process smoother. I'll address the existing comments in the upcoming, smaller PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API-breaking This might introduce incompatible API changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch deserialization to an iterator-based interface to avoid allocations
4 participants