Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to do partial deserialization #1146

Open
73rr0r opened this issue Dec 9, 2024 · 15 comments
Open

How to do partial deserialization #1146

73rr0r opened this issue Dec 9, 2024 · 15 comments
Labels

Comments

@73rr0r
Copy link

73rr0r commented Dec 9, 2024

Hey, I'm struggling with deserialization.

I'm making a API with Axum that, upon request, can return records from a ScyllaDB based on some criteria and with certain fields. for example I can request records with specific IDs and fields, let's say request to my API would look like

{
  "columns": ["id", "action_ts_utc", "name", "description", "for_distribution"],
  "keys": [
    ["some_uuid_here1"],
    ["some_uuid_here2"],
    ["some_uuid_here3"],
    ["some_uuid_here4"]
  ]
}

And my rust struct look like

#[derive(DeserializeRow)]
pub struct MyTableRow {
    pub id: String,
    pub action_ts_utc: DateTime<Utc>,
    pub joined_ts_utc: DateTime<Utc>,
    pub is_active: bool,
    pub name: String,
    pub description: String,
    pub for_distribution: bool,
    pub type: String,
}

So my select query would be like

SELECT id, action_ts_utc, name, description, for_distribution FROM myks.mytable WHERE id IN ('some_uuid_here1', 'some_uuid_here2', 'some_uuid_here3', 'some_uuid_here4');

And i will get an error, something like

TypeCheckError: Failed to type check the Rust type myapi::api::handlers::tables::mytable::schema::MyTableRow against CQL column types [Text, Boolean, Uuid, Int, Timestamp] : wrong column count: the statement operates on 5 columns, but the given rust types contains 1%  

Types can be different, it's just an example

I understand error, which says that columns I selected doesn't match with struct, and something like following not working too.

#[scylla(flavor = "enforce_order", skip_name_checks)]

Making struct fields optional doesn't working, also as #[scylla(skip)]

So how do i properly handle partial deserialization?

@73rr0r 73rr0r changed the title How to do partially deserialization How to do partial deserialization Dec 9, 2024
@Lorak-mmk
Copy link
Collaborator

@wprzytula this sounds like we should add #[scylla(allow_missing)] to DeserializeRow too, now it is only available in DeserializeValue.

@Lorak-mmk Lorak-mmk added enhancement New feature or request area/deserialization labels Dec 9, 2024
@73rr0r
Copy link
Author

73rr0r commented Dec 9, 2024

@wprzytula this sounds like we should add #[scylla(allow_missing)] to DeserializeRow too, now it is only available in DeserializeValue.

Thanks for fast reply!
#[scylla(allow_missing)] would be perfect!

How often do you release? And how long can I expect to see a new version with this feature?

@wprzytula
Copy link
Collaborator

TypeCheckError: Failed to type check the Rust type myapi::api::handlers::tables::mytable::schema::MyTableRow against CQL column types [Text, Boolean, Uuid, Int, Timestamp] : wrong column count: the statement operates on 5 columns, but the given rust types contains 1%

What I can't understand is that 1%.
This is what should be written:

write!(f, "wrong column count: the statement operates on {cql_cols} columns, but the given rust type contains {rust_cols}")

Looking at your struct, I'd expect 8, not 1%. What does the percent sign even come from???

@Lorak-mmk
Copy link
Collaborator

What I can't understand is that 1%. This is what should be written:

write!(f, "wrong column count: the statement operates on {cql_cols} columns, but the given rust type contains {rust_cols}")

Looking at your struct, I'd expect 8, not 1%. What does the percent sign even come from???

It may be a simple error when copying and pasting the error. Is that really relevant to the issue?

@wprzytula
Copy link
Collaborator

What I can't understand is that 1%. This is what should be written:
write!(f, "wrong column count: the statement operates on {cql_cols} columns, but the given rust type contains {rust_cols}")
Looking at your struct, I'd expect 8, not 1%. What does the percent sign even come from???

It may be a simple error when copying and pasting the error. Is that really relevant to the issue?

It is, because I don't understand what's the underlying problem.

If I understand the case correctly, #[scylla(skip)] applied on non-deserialized fields should be the solution to the described issue. But @73rr0r mentioned that skip is not doing what he wants to achieve - so I'm wondering why.

@Lorak-mmk
Copy link
Collaborator

Lorak-mmk commented Dec 9, 2024

It may be a simple error when copying and pasting the error. Is that really relevant to the issue?

It is, because I don't understand what's the underlying problem.

If I understand the case correctly, #[scylla(skip)] applied on non-deserialized fields should be the solution to the described issue. But @73rr0r mentioned that skip is not doing what he wants to achieve - so I'm wondering why.

The way I understand it the user want's to dynamically select a subset of columns for select statement. So if the API user sends

{
  "columns": ["id", "action_ts_utc", "name", "description", "for_distribution"],
  "keys": [
    ["some_uuid_here1"],
    ["some_uuid_here2"],
    ["some_uuid_here3"],
    ["some_uuid_here4"]
  ]
}

then the query will be

SELECT id, action_ts_utc, name, description, for_distribution FROM myks.mytable WHERE id IN ('some_uuid_here1', 'some_uuid_here2', 'some_uuid_here3', 'some_uuid_here4');

and if the API user sends

{
  "columns": ["id", "joined_ts_utc", "for_distribution", "type"],
  "keys": [
    ["some_uuid_here1"],
    ["some_uuid_here2"],
    ["some_uuid_here3"],
    ["some_uuid_here4"]
  ]
}

then the query will be:

SELECT id, joined_ts_utc, for_distribution, type FROM myks.mytable WHERE id IN ('some_uuid_here1', 'some_uuid_here2', 'some_uuid_here3', 'some_uuid_here4');

Now the problem is how to deserialize it. Creating a separate struct for each subset of column is infeasible. Using something like #[scylla(allow_missing)] seems like a perfect fit here.

@wprzytula
Copy link
Collaborator

The way I understand it the user want's to dynamically select a subset of column for select statement.

Makes sense. Then, implementing allow_missing's support for DeserializeRow would indeed help here.

@73rr0r
Copy link
Author

73rr0r commented Dec 9, 2024

TypeCheckError: Failed to type check the Rust type myapi::api::handlers::tables::mytable::schema::MyTableRow against CQL column types [Text, Boolean, Uuid, Int, Timestamp] : wrong column count: the statement operates on 5 columns, but the given rust types contains 1%

What I can't understand is that 1%. This is what should be written:

write!(f, "wrong column count: the statement operates on {cql_cols} columns, but the given rust type contains {rust_cols}")
Looking at your struct, I'd expect 8, not 1%. What does the percent sign even come from???

Yeah, look like my terminal displays % or something like that somehow. In my first message i slightly renamed columns and struct itself just for simplicity, here is a real column names. For privacy purposes I'am not gonna share with you all of my real code, I'll share only specific part.

Here is a real struct:

#[derive(DeserializeRow, Debug)]
#[scylla(flavor = "enforce_order", skip_name_checks)] // https://docs.rs/scylla/latest/scylla/derive.DeserializeRow.html#struct-attributes
pub struct VideoMetaRow {
    pub video_id: Uuid,
    #[scylla(skip)]
    pub channel_id: Option<String>,
    #[scylla(skip)]
    pub origin_type: Option<String>,
    #[scylla(skip)]
    pub created_ts_utc: Option<DateTime<Utc>>,
    #[scylla(skip)]
    pub tv_id: Option<String>,
    #[scylla(skip)]
    pub season: Option<i32>,
    #[scylla(skip)]
    pub episode: Option<i32>,
    #[scylla(skip)]
    pub category_id: Option<i32>,
    #[scylla(skip)]
    pub title: Option<String>,
    #[scylla(skip)]
    pub publication_ts_utc: Option<DateTime<Utc>>,
    #[scylla(skip)]
    pub is_active: Option<bool>,
    #[scylla(skip)]
    pub is_adult: Option<bool>,
    #[scylla(skip)]
    pub duration_sec: Option<i32>,
    #[scylla(skip)]
    pub has_high_quality: Option<bool>,
    #[scylla(skip)]
    pub has_preview: Option<bool>,
}

Here is a two requests to my API, first is returning correct result and second is not:

curl -X 'POST' \
  'http://0.0.0.0:8000/feature_set/video_meta' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "features": ["video_id"],
    "keys": [["k1"], ["k2"], ["k3"], ["k4"]]
  }'
curl -X 'POST' \
  'http://0.0.0.0:8000/feature_set/video_meta' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "features": ["video_id", "channel_id"],
    "keys": [["k1"], ["k2"], ["k3"], ["k4"]]
  }'

And the SQL statements which is generated for the first and second requests:

SELECT video_id FROM video_meta WHERE video_id IN (?);
SELECT video_id, channel_id FROM video_meta WHERE video_id IN (?);

And my responses from API:

[["k1"],["k2"]]% 
Repository error: Query error: TypeCheckError: Failed to type check the Rust type fsrs_api::api::handlers::feature_set::video_meta::schema::VideoMetaRow against CQL column types [Uuid, Int] : wrong column count: the statement operates on 2 columns, but the given rust types contains 1% 

As you can see % is only in my logs in terminal. About meaning of 1, I don't see any patterns for e.g. increasing the number of columns.

Also i have to mention that I'm using CachingSession with prepared statements and session.execute_iter(query, values).await?.rows_stream::<T>()?

I tried to:

  1. Marking fields as Optional.
  2. Using #[scylla(flavor = "enforce_order", skip_name_checks)].
  3. Using #[scylla(skip)] with #[scylla(flavor = "enforce_order", skip_name_checks)].
  4. Everything together.

And nothing helped to me.
Then i read source code and realized that there is not implementation to allow select not all fields from ScyllaDB and then use new deserialization API. With old API that was possible, i guess with FromRow, but I am not sure, because I did not use old deserialization API.

Also, found no info in docs about macro attributes like #[scylla(skip)]. I don't rule out the possibility that I might be blind ;) But would be good to add some docs about deserialization aspects.

My service is a simple Axum application.

You can think about it as a layer between ScyllaDB and my other microservices. It maps incoming request into SQL statement and then returns deserialized result in a response as only values, a.k.a records, e.g. [ ["k1", 123], ["k2", 456] ] to reduce size of response.

Of course my code can be not the best example of Rust programming, but still i think ScyllaDB driver should give an opportunity of partial deserialization, maybe marking fields as Optional or with #[scylla(allow_missing)] would be a good approach.
As i said I am not an Rust expert, but in other languages i know, this feature is available.

Hope now i provided enough context to you. If any other questions, feel free to ask.

@wprzytula
Copy link
Collaborator

wprzytula commented Dec 9, 2024

Also, found no info in docs about macro attributes like #[scylla(skip)]. I don't rule out the possibility that I might be blind ;) But would be good to add some docs about deserialization aspects.

The attributes are described here: https://docs.rs/scylla/latest/scylla/derive.DeserializeValue.html, https://docs.rs/scylla/latest/scylla/derive.DeserializeRow.html

@73rr0r
Copy link
Author

73rr0r commented Dec 9, 2024

Also, found no info in docs about macro attributes like #[scylla(skip)]. I don't rule out the possibility that I might be blind ;) But would be good to add some docs about deserialization aspects.

The attributes are described here: https://docs.rs/scylla/latest/scylla/derive.DeserializeValue.html, https://docs.rs/scylla/latest/scylla/derive.DeserializeRow.html

Yep, thats right, I found them exactly in crate docs, thats not very obvious, because the first link in Google leads to website i mentioned earlier ;). For me thats fine, but for simple users i guess thats not obvious to check macro attributes somewhere in crate docs, it's like finding Rust docstrings in source code at GitHub repo I think, only experienced users do that. But that only my opinion, not the real problem.

@wprzytula
Copy link
Collaborator

i guess thats not obvious to check macro attributes somewhere in crate docs, it's like finding Rust docstrings in source code at GitHub repo I think, only experienced users do that.

Hmm, that's surprising. For us it's been intuitive that crate docs are a basic source of knowledge about a Rust crate.

@Lorak-mmk
Copy link
Collaborator

@wprzytula this sounds like we should add #[scylla(allow_missing)] to DeserializeRow too, now it is only available in DeserializeValue.

Thanks for fast reply! #[scylla(allow_missing)] would be perfect!

How often do you release? And how long can I expect to see a new version with this feature?

We are currently focusing our efforts on stabilizing the library and releasing 1.0 version.
#[scylla(allow_missing)] for DeserializeRow is an addition that is backwards-compatible, se we won't be able to prioritize it before releasing 1.0. You are welcome to send a PR with this change of course.

@73rr0r
Copy link
Author

73rr0r commented Dec 10, 2024

@wprzytula this sounds like we should add #[scylla(allow_missing)] to DeserializeRow too, now it is only available in DeserializeValue.

Thanks for fast reply! #[scylla(allow_missing)] would be perfect!
How often do you release? And how long can I expect to see a new version with this feature?

We are currently focusing our efforts on stabilizing the library and releasing 1.0 version. #[scylla(allow_missing)] for DeserializeRow is an addition that is backwards-compatible, se we won't be able to prioritize it before releasing 1.0. You are welcome to send a PR with this change of course.

I would love to try to contribute in this lib, but I think my skills are not enough for this :)
I started learning Rust like couple weeks ago and came from Python, C#, Javascript, Typescript and NodeJS. And in my last project with my team we used Python wrapper for your Rust scylla lib, but it's kinda slow and other Python frameworks was kinda slow too, so I decided to rewrite it to Rust. So I can try to contribute, but it will took to much time and it will be faster if someone experienced in Rust will do this, so thats why I asked about your release cycle.

Btw after all known languages I am becoming a fun of Rust lang

@73rr0r
Copy link
Author

73rr0r commented Dec 10, 2024

i guess thats not obvious to check macro attributes somewhere in crate docs, it's like finding Rust docstrings in source code at GitHub repo I think, only experienced users do that.

Hmm, that's surprising. For us it's been intuitive that crate docs are a basic source of knowledge about a Rust crate.

Yeah, it's obvious for me too, but for example I have some colleagues for whom this was not obvious, probably because they are like jun-middle level. So i guess people who are going deep dive into the Rust not gonna realize immediately what he has to look at crate docs instead of official db docs.

It's looks like a holywar about where should be the source of truth for libs and services, in some Confluence, some local wiki or something like that or in the code and docstrings itself.

Personally i prefer it in both places, but with links to source code docs in Confluence for example.

@73rr0r
Copy link
Author

73rr0r commented Dec 10, 2024

Overall if I will have some time in non-working hours, maybe I will try to contribute, but I am not 100% sure about it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants