Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to handle multiple data types simultaneously #5

Open
ivarflakstad opened this issue Jan 4, 2023 · 0 comments
Open

Document how to handle multiple data types simultaneously #5

ivarflakstad opened this issue Jan 4, 2023 · 0 comments

Comments

@ivarflakstad
Copy link
Contributor

ivarflakstad commented Jan 4, 2023

Wall of text incoming!

TLDR

My conclusion is this:
We should inform users that if they have identical structs within an enum the de/serialization may not work as they expect.
If they absolutely must have identical structs in an enum they have to use #serde[tag] and remember to write their queries with this in mind (i.e. remember to add type: $type). Otherwise it will not deserialize the result of a select correctly, but then again serde will error if the tag is not included with Error("missing field type", line: 0, column: 0), so with a bit of documentation I think it's fair to give the responsibility to the user.

Problem

With a query (like MATCH p=()-->() return p) you will get all vertices and edges in the db.
For example let's say we get "A -[E]->[F]-> B".

We can't fit both the A and B types into V directly, so we have to use an enum or perhaps Box<dyn SomeTrait>. Same goes for edges with E and F.
I think enums are the correct choice here.

// We already have the structs defined above.

#[derive(Deserialize, Serialize, PartialEq)]
enum ExampleVertex {
    A(A),
    B(B)
}

#[derive(Deserialize, Serialize, PartialEq)]
enum ExampleEdge {
    E(E),
    F(F),
}

// `result` holds the result from querying "MATCH p=()-->() return p",
for row in result {
    let path: Path<ExampleVertex, ExampleEdge> = row.get(0);
}

So far so good.

However, if any of our structs are evaluated as identical by serde it will deserialize to the first struct that matches the data in the row.

#[derive(Deserialize, Serialize)]
struct A {
    id: usize
}

#[derive(Deserialize, Serialize)]
struct B {
    id: usize
}

#[derive(Deserialize, Serialize)]
enum ExampleEnum {
    A(A)
    B(B)
}

The contents of a B could be deserialized into a A here, resulting in something like
Vertex { id: 123456789, label: "B", properties: A(A { id: 1 }) }

Suggested solution

The absolutely simplest solution is obviously to recommend that users of the library do not use identical structs within an enum 😆

But if we want to support it the best way to deal with this (in my opinion) is to inform the users that they can use serde tagging.

#[derive(Deserialize, Serialize, PartialEq)]
#[serde(tag = "type")]
enum ExampleEnum {
    A(A)
    B(B)
}

If we now create an instance of B that includes the corresponding type field

CREATE(b:B {id: 1, type:'B'})  // type:'B' is the important part
MATCH (b: B) WHERE b.id = 1 RETURN b

We can now query using our enum and will deserialize to the correct enum variant.

let b_instance: Vertex<ExampleVertex> = row.get(0);

Resulting in

Vertex { id: 844424930131969, label: "B", properties: B(B { id: 1 }) }

I've tested this approach and I am pretty sure that this works in all cases, but we should definitely add some tests.

Also it's important to remember that if you're using AgType to save models you must remember to include the type in the query.

fn insert_example_vertex(v: ExampleVertex) {
  client.execute_cypher(
      'my_graph',
      "CREATE(b: B { id: $id, type: $type })",  // <- here
      Some(AgType::<ExampleVertex>(v)
  )
}

Forgetting $type will cause the enum to not be deserialized correctly, but then again serde will error if the tag is not included with Error("missing field type", line: 0, column: 0), so with a bit of documentation I think it's fair to give the responsibility to the user.

Alternatives

For completeness I'm including some alternative approaches that I personally think aren't as good.

Deserialize twice

You can deserialize to a Vertex first (or a serde json Value), then match on the label, and finally deserialize that into the correct model.
Note that I haven't properly tested this technique, but it should work.

fn from_label(vertex_string: str) -> ExampleVertex {
    // Using Vertex<String> here because we're only interested in the label
    match serde_json::from_str::<Vertex<String>>(vertex_string) {
        Ok(vertex) => {
            match vertex.label() {
                "A" => match serde_json::from_str::<Vertex<A>>(vertex_string) {
                    Ok(a) => Ok(ExampleVertex::A(a)),
                    Err(e) => Err(e),
                },
                "B" => match serde_json::from_value::<B>(value) {
                    Ok(b) => Ok(ExampleVertex::B(b)),
                    Err(e) => Err(e),
                },
            }
        }
    }
}

I do not think this is as good an approach as serde tagging because it is

  1. Ugly.
  2. Annoying to require the users of the lib to write all this boilerplate.

Custom de/serializer

We could create custom de/serializer that inspects the label and uses that de/serialize correctly.
Using something like strum AsRefStr would be useful, but we then we would have to differentiate between simple structs and enums within the method.
I suspect this will be quite a bit of extra work, so I don't recommend this approach.

@Dzordzu Dzordzu added this to the Good documentation milestone Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants