-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use prepared statement result metadata to decode rows #925
Use prepared statement result metadata to decode rows #925
Conversation
I need to adjust the test case so we alter the table in some other way. The logs from CI:
|
I would prefer if you summarized your changes using prose and not a bulleted list. As a reviewer, I'm interested in:
Your bullet points don't give me this information, not easily at least. They concentrate too much on the details. For example, this point:
doesn't give any justification what it is for. After some adjustments it would serve well as a commit title (which should also be accompanied with a description unless the title itself is descriptive enough). After reading all the bullet points I can form some idea about how the whole thing looks like, given that I'm familiar with the codebase and I created the issue that is being fixed here. However, other developers who will want to modify your code in the future might not have that luxury. Besides helping maintainers, reviewers and other developers in the long run, I believe that preparing such a summary gives one a good opportunity to reflect on their proposed changes and help convince oneself that their changes make sense (or decide that they don't). |
Thank you for the feedback, I'll make sure to adjust the PR description as well as commit messages that are not descriptive enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to know more about how the driver behaves when a signal to reprepare is missed (i.e. some other application reprepares and this one misses it). The scenario with SELECT *
and adding a column is known, but I wonder what happens if we have a UDT column or a tuple and we add a field to the UDT/tuple. I think this case might actually be quite common contrary to SELECT *
so I'm wondering whether other drivers handle it correctly and if we handle it correctly.
Could you do a test and see how our driver behaves in this scenario? I'd also like to know how other drivers handle it, for example the Java driver - it seems the most mature of the drivers I know.
So I've tested the UDT case you provided with both rust and java drivers. The key thing to notice is that java driver actually does NOT update the result metadata of the prepared statement during repreparations. The metadata is immutable - set only once during the first preparation of the statement. In result, the scenario you described fails even when only one client connects to the database. The demo (on import com.datastax.oss.driver.api.core.CqlSession;
import com.datastax.oss.driver.api.core.cql.PreparedStatement;
import com.datastax.oss.driver.api.core.cql.Row;
import com.datastax.oss.driver.api.core.data.UdtValue;
public class CreateAndPopulateKeyspace {
public static void main(String[] args) {
CreateAndPopulateKeyspace client = new CreateAndPopulateKeyspace();
try {
client.connect();
client.createSchema();
client.prepareStatement();
client.alterSchema();
client.reprepareStatement();
} catch (Exception ex) {
ex.printStackTrace();
} finally {
client.close();
}
}
private CqlSession session;
private PreparedStatement preparedStatement;
public void connect() {
session = CqlSession.builder().build();
System.out.printf("Connected session: %s%n", session.getName());
}
public void createSchema() {
session.execute("DROP KEYSPACE IF EXISTS simplex");
session.execute(
"CREATE KEYSPACE IF NOT EXISTS simplex WITH replication "
+ "= {'class':'SimpleStrategy', 'replication_factor':1};");
session.execute("CREATE TYPE simplex.my_type (a int)");
session.execute(
"CREATE TABLE simplex.tab (a int PRIMARY KEY, b int, c simplex.my_type)");
session.execute("INSERT INTO simplex.tab (a, b, c) VALUES (1, 2, { a: 1 })");
}
public void prepareStatement() {
preparedStatement = session.prepare("SELECT c FROM simplex.tab");
}
public void alterSchema() {
session.execute("ALTER TYPE simplex.my_type ADD d blob");
}
public void reprepareStatement() {
Row row = session.execute(preparedStatement.bind()).one();
UdtValue udtValue = row.getUdtValue("c");
System.out.println(udtValue);
}
public void close() {
if (session != null) {
session.close();
}
}
} The output:
This is the exact same reason why The same test case that uses rust driver works since we update the client-side result metadata after statement repreparation. However, this won't work in the following 2-client scenario:
Now the question is whether we want to follow the java driver's approach and make Unfortunately, no solution to the multi-client scenario comes to my mind. Since java driver is vulnerable to this as well, I believe there is no reasonable solution to the problem. AFAIK, the 5th version of CQL protocol offers some solution where the server informs all of its clients about the change of metadata. |
Thanks for checking. I think it makes more sense to make the result metadata immutable. The approach with |
217f794
to
aa2ff61
Compare
v2: rebased on top of master |
cargo semver-checks output
|
aa2ff61
to
e19421a
Compare
cargo semver-checks output
|
v3 - addressed review comments:
Question related to tests:
Note: there is still an issue with |
e19421a
to
312556b
Compare
v4: adjusted docstrings so the CI passes |
cargo semver-checks output
|
Yes, the proxy would be the right tool for this kind of test. |
And - thanks for adding the descriptions as I requested. I really appreciate that. Don't forget to update it when you make changes to the implementation that would make it outdated (I see that it mentions |
cargo semver-checks output
|
9500189
to
db33284
Compare
cargo semver-checks output
|
db33284
to
50560e4
Compare
cargo semver-checks output
|
50560e4
to
46b59b7
Compare
cargo semver-checks output
|
5519635
to
e8b792a
Compare
cargo semver-checks output
|
e8b792a
to
46b59b7
Compare
cargo semver-checks output
|
85efa6a
to
a55c246
Compare
cargo semver-checks output
|
Rebased on top of main, so the CI passes |
a55c246
to
6a871cc
Compare
v5: addressed review comments |
cargo semver-checks output
|
6a871cc
to
030a5f4
Compare
Hidden the `PreparedMetadata` getter, as it's a low-level struct. Exposed two getters for the information that might be useful to the users: - column specs of bind variables - partition key indexes of bind variables
Added a `result_metadata` to `PreparedStatementSharedData` struct so the driver caches the result metadata provided by the server after statement preparation.
Added a new integration test which tests whether the SKIP_METADATA optimization works correctly when executing prepared statements. The test case does two things: - executes a statement without SKIP_METADATA flag. Using the proxy, it verifies that server sent metadata in the response. - executes a statement with SKIP_METADATA flag. Using the proxy, it verifies that the server did not attach metadata in the response.
030a5f4
to
58a9885
Compare
v6:
|
cargo semver-checks output
|
Re-exported types that should be exposed to users. Re-exported types: - ColumnSpec - ColumnType - CqlValue - PartitionKeyIndex - Row - TableSpec Note for reviewers: As mentioned in scylladb#925 (comment), we should not expose the `PreparedMetadata` structure. I believe the same applies to `ResultMetadata`. That's why I decided not to re-export them.
Re-exported types that should be exposed to users. Re-exported types: - ColumnSpec - ColumnType - CqlValue - PartitionKeyIndex - Row - TableSpec Note for reviewers: As mentioned in scylladb#925 (comment), we should not expose the `PreparedMetadata` structure. I believe the same applies to `ResultMetadata`. That's why I decided not to re-export them.
Re-exported types that should be exposed to users. Re-exported types: - ColumnSpec - ColumnType - CqlValue - PartitionKeyIndex - Row - TableSpec Note for reviewers: As mentioned in scylladb#925 (comment), we should not expose the `PreparedMetadata` structure. I believe the same applies to `ResultMetadata`. That's why I decided not to re-export them.
Fixes: #169
Motivation
Currently, the driver does not make a use of an optimization that CQL protocol offers during exeuction of prepared statements. The client is allowed to provide
SKIP_METADATA
flag duringSession::execute
, so the server doesn't attach a result set metadata. The driver can use the cached metadata (received in response toSession::prepare
) to deserialize the results.Changes
PreparedStatement
As mentioned previously, the driver can cache the prepared statement's result metadata. This is why we extend the
PreparedStatementSharedData
byresult_metadata
field which holds the metadata. The metadata is obtained from the server during statement preparation (Session::prepare
). This metadata is eventually passed to the function which deserializes the results of query execution.Schema changes
Before introducing the optimization, the driver was immune to the problems related to prepared statement's result metadata and schema changes.
After some schema change, Scylla and Cassandra invalidate the server-side caches and discard all of the prepared statements. When the driver executes a prepared statement after the schema change for the first time, it receives the
UNPREPARED
error frame (server does not recognize the id of the prepared statement since it was discarded). Currently, Scylla drivers handle such case by repreparation of the statement. Notice that after the schema change, the result metadata may be changed (depends on the actual change of schema). Without the optimization, Scylla always provides a valid (updated during repreparation) result metadata in the response to statement execution so we don't need to worry about it.This raises an issue when we try to cache the metadata on the client side. It should be mutable so the driver holds a metadata that is always up to date (should be updated during reprepares). However, looking at the other drivers we can see that the client-side metadata caches are immutable. This is why we follow the same approach and not care about the schema changes.
Config
Extended and
StatementConfig
byskip_result_metadata
boolean flag which defaults tofalse
. When set totrue
, it enables the optimization. Thanks to that, the users can enable the optimization per-statement.Points to discuss
skip_result_metadata
flag be enabled by default? Since it's an optimization, it might make sense for it to be enabled by default.Pre-review checklist
./docs/source/
.Fixes:
annotations to PR description.