Skip to content

Commit

Permalink
docs: add migration guide for pre-0.11 serialization
Browse files Browse the repository at this point in the history
In 0.11, the serialization traits are reworked and some guidance will be
needed to adapt the existing code to the new traits. Add a document
which explains the reason behind the changes, potential issues to watch
for and tools that should make migration easier.
  • Loading branch information
piodul committed Dec 18, 2023
1 parent 345fd20 commit 86c1737
Show file tree
Hide file tree
Showing 5 changed files with 115 additions and 0 deletions.
3 changes: 3 additions & 0 deletions docs/source/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
- [Running Scylla using Docker](quickstart/scylla-docker.md)
- [Connecting and running a simple query](quickstart/example.md)

- [Migration guides](migration-guides/migration-guides.md)
- [Adjusting code to changes in serialization API introduced in 0.11](migration-guides/0.11-deserialization.md)

- [Connecting to the cluster](connecting/connecting.md)
- [Compression](connecting/compression.md)
- [Authentication](connecting/authentication.md)
Expand Down
1 change: 1 addition & 0 deletions docs/source/contents.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
retry-policy/retry-policy
speculative-execution/speculative
metrics/metrics
migration-guides/migration-guides
logging/logging
tracing/tracing
schema/schema
1 change: 1 addition & 0 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Although optimized for Scylla, the driver is also compatible with [Apache Cassan

## Contents
* [Quick start](quickstart/quickstart.md) - Setting up a Rust project using `scylla-rust-driver` and running a few queries
* [Migration guides](migration-guides/migration-guides.md) - How to update the code that used an older version of this driver
* [Connecting to the cluster](connecting/connecting.md) - Configuring a connection to scylla cluster
* [Making queries](queries/queries.md) - Making different types of queries (simple, prepared, batch, paged)
* [Execution profiles](execution-profiles/execution-profiles.md) - Grouping query execution configuration options together and switching them all at once
Expand Down
100 changes: 100 additions & 0 deletions docs/source/migration-guides/0.11-serialization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Adjusting code to changes in serialization API introduced in 0.11

## Background

When executing a statement through the CQL protocol, values for the bind markers are sent in a serialized, untyped form. In order to implement a safer and more robust interface, drivers can the information returned after preparing a statement to check the type of the data provided by the user against the actual types of the bind markers.

Before 0.11, the driver couldn't do this kind of type checking. For example, in the case of non-batch queries, the only information about the user data it has is that it implements `ValueList` - defined as follows:

```rust
pub trait ValueList {
fn serialized(&self) -> SerializedResult<'_>;
fn write_to_request(&self, buf: &mut impl BufMut) -> Result<(), SerializeValuesError>;
}
```

The driver would naively serialize the data and hope that the user took care to send correct types of values. Failing to do so would, in the best case, fail on the DB-side validation; in the worst case, the data in its raw form may be reinterpreted as another type in an unintended manner.

Another problem is that the information from the prepared statement response is required to robustly serialize user defined types, as UDTs require their fields to be serialized in the same order as they are defined in the database schema. The `IntoUserType` macro which implements Rust struct -> UDT serialization just expects that the order of the Rust struct fields matches the schema, but ensuring this can be very cumbersome for the users.

In version 0.11, a new set of traits is introduced and the old ones are deprecated. The new traits receive more information during serialization such as names of the column/bind markers and their types, which allows to fix the issues mentioned in the previous section.

## Old vs. new

Both the old and the new APIs are based on three core traits:

| Old API | New API | Description |
| ---------------------------------------------------- | -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Value` | `SerializeCql` | A type that can serialize itself to a single CQL value. For example, `i32` serializes itself into a representation that is compatible with the CQL `int` type. |
| `ValueList` | `SerializeRow` | A type that can serialize itself as a list of values for a CQL statement. For example, a `(i32, &str)` produces a list of two values which can be used in a query with two bind markers, e.g. `SELECT * FROM table WHERE pk = ? AND ck = ?`. Optionally, values in the produced list may be associated with names which is useful when using it with a query with named bind markers, e.g. `SELECT * FROM table WHERE pk = :pk AND ck = :ck`. |
| `LegacyBatchValues` (previously named `BatchValues`) | `BatchValues` | Represents a source of data for a batch request. It is essentially equivalent to a list of `ValueList`, one for each statement in the batch. For example, `((1, 2), (3, 4, 5))` can be used for a batch with two statements, the first one having two bind markers and the second one having three. |

All methods which take one of the old traits were changed to take the new trait - notably, this includes `Session::query`, `(Caching)Session::execute`, `(Caching)Session::batch`.

The driver comes a set of `impl`s of those traits which allow to represent any CQL type (for example, see [Data Types](../data-types/data-types.md) page for a list of for which `Value` and `SerializeCql` is implemented). If the driver implements an old trait for some type, then it also provides implements the new trait for the same type.

## Migration scenarios

### Different default behavior in `SerializeRow`/`SerializeCql` macros

By default, the `SerializeRow` and `SerializeCql` **will match the fields in the Rust struct by name to bind marker names** (in case of `SerializeRow`) **or UDT field names** (in case of `SerializeCql`). This is different from the old `ValueList` and `IntoUserType` macros which did not look at the field names at all and would expect the user to order the fields correctly. While the new behavior is much more ergonomic, you might have reasons not to use it.

> **NOTE:** The deserialization macro counterparts `FromRow` and `FromUserType` have the same limitation as the old serialization macros - they require struct fields to be properly ordered. While a similar rework is planned for the deserialization traits in a future release, for the time being it might not be worth keeping the column names in sync with the database.
In order to bring the old behavior to the new macros (the only difference being type checking which cannot be disabled right now) you can configure it using attributes, as shown in the snippet below:

```rust
// The exact same attributes apply to the `SerializeRow` macro and their
// effect is completely analogous.
#[derive(SerializeCql)]
#[scylla(flavor = "enforce_order", skip_name_checks)]
struct Person {
name: String,
surname: String,
age: i16,
}
```

Refer to the API reference page for the `SerializeRow` and `SerializeCql` macros in the `scylla` crate to learn more about the supported attributes and their meaning.

### Preparing is mandatory with a non-empty list of values

> **NOTE:** The considerations in this section only concerns users of the `Session` API, `CachingSession` is not affected as it already does preparation before execute and caches the result.
As explained in the [Background](#background) section, the driver uses data returned from the database after preparing a statement in order to implement type checking. As the new API makes type checking mandatory, **the driver must prepare the statement** so that the data for the bind markers can be type checked. It is done in case of the existing methods which used to send unprepared statements: `Session::query` and `Session::batch`.

> **NOTE:** The driver will skip preparation if it detects that the list of values for the statement is empty, as there is nothing to be type checked.
If you send simple statements along with non-empty lists of values, the slowdown will be as follows:

- For `Session::query`, the driver will optionally prepare the statement before sending it, incurring an additional round-trip.
- For `Session::batch`, the driver will send a prepare request for each *unique* unprepared statement with a non-empty list of values. **This is done serially!**

In both cases, if the additional roundtrips are unacceptable, you should prepare the statements beforehand and reuse them - which aligns with our general recommendation against use of simple statements in performance sensitive scenarios.

### Migrating from old to new traits *gradually*

In some cases, migration will be as easy as changing occurrences of `IntoUserType` to `SerializeCql` and `ValueList` to `SerializeRow` and adding some atributes for procedural macros. However, if you have a large enough codebase or some custom, complicated implementations of the old traits then you might not want to migrate everything at once. To support gradual migration, the old traits were not removed but rather deprecated, and we introduced some additional utilities.

#### Converting an object implementing an old trait to a new trait

We provide a number of newtype wrappers:

- `ValueAdapter` - implements `SerializeCql` if the type wrapped over implements `Value`,
- `ValueListAdapter` - implements `SerializeRow` if the type wrapped over implements `ValueList`,
- `LegacyBatchValuesAdapter` - implements `BatchValues` if the type wrapped over implements `LegacyBatchValues`.

Note that these wrappers are not zero cost and incur some overhead: in case of `ValueAdapter` and `ValueListAdapter`, the data is first written into a newly allocated buffer and then rewritten to the final buffer. In case of `LegacyBatchValuesAdapter` there shouldn't be any additional allocations unless the implementation has an efficient, non-default `Self::LegacyBatchValuesIterator::write_next_to_request` implementation (which is not the case for the built-in `impl`s).

Naturally, the implementations provided by the wrappers are not type safe as they directly use methods from the old traits.

Conversion in the other direction is not possible.

#### Custom implementations of old traits

It is possible to directly generate an `impl` of `SerializeRow` and `SerializeCql` on a type which implements, respectively, `ValueList` or `Value`, without using the wrappers from the previous section. The following macros are provided:

- `impl_serialize_cql_via_value` - implements `SerializeCql` if the type wrapped over implements `Value`,
- `impl_serialize_row_via_value_list` - implements `SerializeRow` if the type wrapped over implements `ValueList`,

The implementations are practically as those generated by the wrappers described in the previous section.
10 changes: 10 additions & 0 deletions docs/source/migration-guides/migration-guides.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Migration guides

- [Serialization changes in version 0.11](0.11-serialization.md)

```eval_rst
.. toctree::
:hidden:
:glob:
0.11-serialization
```

0 comments on commit 86c1737

Please sign in to comment.