Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the correct version of PreparedMetadata for each node #815

Open
cvybhu opened this issue Sep 20, 2023 · 5 comments
Open

Use the correct version of PreparedMetadata for each node #815

cvybhu opened this issue Sep 20, 2023 · 5 comments

Comments

@cvybhu
Copy link
Contributor

cvybhu commented Sep 20, 2023

Each unprepared statement has to be prepared on all nodes.
The driver sends a PREPARE request to every node, and in response it gets PreparedMetadata which contains information about this prepared statement.

Currently we assume that every node will return the same PreparedMetadata, after all we prepare the same statement on all of them. It turns out that this assumption isn't always true. During a rolling update different nodes might run different versions of Scylla, and different versions of Scylla might generate different PreparedMetadata for the same statement.
The driver ignores the fact that PreparedMetadata is different on some nodes, and because of this it might send a malformed request to them.

For example we could have the following query:

CREATE TABLE tks.tab (p int, c int, PRIMARY KEY (p, c))
SELECT p FROM tks.tab WHERE p = :x AND c = :x

scylladb/scylladb@19a6e69 changed how prepared queries look like when the same named bind variable is used multiple times.

For the above example, here's how the prepared metadata looks like on 2021.1.16:

PreparedMetadata {
    flags: 1,
    col_count: 2,
    pk_indexes: [
        PartitionKeyIndex {
            index: 0,
            sequence: 0,
        },
    ],
    col_specs: [
        ColumnSpec {
            table_spec: TableSpec {
                ks_name: "tks",
                table_name: "tab",
            },
            name: "x",
            typ: Int,
        },
        ColumnSpec {
            table_spec: TableSpec {
                ks_name: "tks",
                table_name: "tab",
            },
            name: "x",
            typ: Int,
        },
    ],
}

And here's how it looks on 2022.2.12:

PreparedMetadata {
    flags: 1,
    col_count: 1,
    pk_indexes: [],
    col_specs: [
        ColumnSpec {
            table_spec: TableSpec {
                ks_name: "tks",
                table_name: "tab",
            },
            name: "x",
            typ: Int,
        },
    ],
}

Note that the number of bind variables has changed - on 2022.1.16 Scylla expects two bound variables, but in 2022.2.12 it expects to see only one.

If we try to do a rolling update from 2021.1.16 to 2022.2.12 and run the example query, the following will happen:

  1. All nodes are on the old version
  2. Driver starts up and prepares the query, it receives PreparedMetadata generated by the old version
  3. The rolling update starts
  4. One node goes down and comes back up, but it now runs the new version of Scylla
  5. Driver tries to execute a prepared statement on this node, but receives UNPREPARED, as this node's cache has been cleared during the restart
  6. Driver sends PREPARE request, and receives the PreparedMetadata generated by the new version of Scylla
  7. Driver ignores that this node responded with a different PreparedMetadata
  8. Driver tries to send queries to the new node, but it uses the old PreparedMetadata
  9. The queries fail because the driver sends two bind variables, but the new Scylla expects only one

The driver should be aware that some nodes require the old PreparedMetadata, and some require the new one. It should keep all of the required versions around and use the correct one for each node.

Refs: #575

@mykaul
Copy link
Contributor

mykaul commented Sep 21, 2023

I wonder if it's a Rust specific issue - do other drivers behave correctly?

@avelanarius
Copy link

I don't think this issue is solveable for a case of two clients.

Let's suppose that client A does this:

  1. All nodes are on the old version
  2. Driver starts up and prepares the query, it receives PreparedMetadata generated by the old version
  3. The rolling update starts
  4. One node goes down and comes back up, but it now runs the new version of Scylla
  5. Driver tries to execute a prepared statement on this node, but receives UNPREPARED, as this node's cache has been cleared during the restart
  6. Driver sends PREPARE request, and receives the PreparedMetadata generated by the new version of Scylla
  7. Driver ignores that this node responded with a different PreparedMetadata
    8.Driver tries to send queries to the new node, but it uses the old PreparedMetadata
  8. The queries fail because the driver sends two bind variables, but the new Scylla expects only one

and in the meantime client B:

  1. All nodes are on the old version
  2. Driver starts and prepares the query, it receives PreparedMetadata generated by the old version
  3. In the meantime client A does all steps 1. - 9.
  4. Now if the client B tries to execute previously prepared query, it won't have to reprepare the query - client A already did. Therefore client B will have no idea that anything change and it will try to use the old prepared metadata.

Therefore closing this issue, I don't think that there is a good solution for this without changing the protocol (V5 protocol solved this problem).

@avelanarius avelanarius closed this as not planned Won't fix, can't repro, duplicate, stale Jan 16, 2024
@Lorak-mmk
Copy link
Collaborator

If protocol V5 solved this, shouldn't we keep this open until we and Scylla introduce support for it?

@mykaul
Copy link
Contributor

mykaul commented Jan 16, 2024

If protocol V5 solved this, shouldn't we keep this open until we and Scylla introduce support for it?

That's going to take some time...

@Lorak-mmk
Copy link
Collaborator

Having this issue open doesn't hurt anybody, and it describes a bug that users might encounter

@Lorak-mmk Lorak-mmk reopened this Jan 16, 2024
@Lorak-mmk Lorak-mmk removed their assignment Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants