-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove MissingUserDefinedType from public API #1162
base: main
Are you sure you want to change the base?
Conversation
Previously this method accepted a map with UDTs for all keyspaces. This is not necessary: UDTs in one keyspace can not reference UDTs in another keyspace.
This function performs a request that fetches columns for all tables and views. It is potentially the most performance-impactful part of the schema fetching process, and it was unnecessarily called twice: in query_tables and in query_views. It can be easily prevented by calling the function earlier and passing the result to query_tables and query_views.
MissingUserDefinedType was an obstacle preventing us from unifying ColumnType and CqlType. This commit changes the topology fetching code to remove the keyspace where such an error happened from the metadata. A warning is printed in such case.
It is no longer part of any public API.
This commit changes type of `keyspaces` field in `Metadata` to from `HashMap<String, Keyspace>` to `HashMap<String, Result<Keyspace, MissingUserDefinedType>>`. Because of that, it also removed `MissingUserDefinedType` handling from `query_metadata`. Now handling this error is done in `ClusterData::new`. This has an advantage: we can use older version of the keyspace metadata if the new version has this error.
See the following report for details: cargo semver-checks output
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I really dislike how complicated all the methods have become. They accept hashmap where value is a Result
, and return a really complicated type with nested Result
. I don't see how we could simplify this, though. It looks horrible, but does the job - we can propagate the error for each keyspace up to ClusterData::new()
thanks to this.
let mut tables_schema = query_tables_schema(conn, keyspaces_to_fetch, &udts).await?; | ||
( | ||
query_tables(conn, keyspaces_to_fetch, &udts).await?, | ||
query_views(conn, keyspaces_to_fetch, &udts).await?, | ||
query_tables(conn, keyspaces_to_fetch, &mut tables_schema).await?, | ||
query_views(conn, keyspaces_to_fetch, &mut tables_schema).await?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we pass a mutable reference to tables_schema
twice. Both query_tables
and query_views
remove items from the map. Are we guaranteed that we will try to remove each entry (keyspace, table/view)
at most once? I think it makes sense to make such assumption (I don't think there can be a view with the same name as some table within one keyspace), but I'm asking just to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I launched a Scylla instance, created a keyspace, and inside this keyspace a table and a materialized view.
Materialized view was only present in system_schema.views
(not in system_schema.tables
) while the table was only present in system_schema.tables
(not system_schema.views
).
I can also check with a secondary index, and verify that it is the same for C*, but no other scenarios / tests come to my mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the scenario you checked is enough
scylla/src/transport/topology.rs
Outdated
async fn query_views( | ||
conn: &Arc<Connection>, | ||
keyspaces_to_fetch: &[String], | ||
tables: &mut HashMap<(String, String), Table>, | ||
) -> Result<HashMap<String, HashMap<String, MaterializedView>>, QueryError> { | ||
tables: &mut HashMap<(String, String), Result<Table, MissingUserDefinedType>>, | ||
) -> Result< | ||
HashMap<String, Result<HashMap<String, MaterializedView>, MissingUserDefinedType>>, | ||
QueryError, | ||
> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really surprised that clippy does not complain about the complexity of return type. Do you think that introducing a documented generic type alias would aid readability of the code? The same pattern Result<HashMap<K, Result<V, MissingUserDefinedType>>, QueryError>
appears in 3 different methods (query_[tables_schema/tables/views]
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could introduce a type alias like type PerKeyspaceResult<T, E> = HashMap<String, Result<T, E>
.
The return type here would become
Result<PerKeyspaceResult<HashMap<String, MaterializedView>, MissingUserDefinedType>, QueryError>
Other than that I am open to ideas about improving the simplicity of the code here.
let (tables, views, user_defined_types) = match (tables, views, user_defined_types) { | ||
(Ok(t), Ok(v), Ok(u)) => (t, v, u), | ||
(Err(e), _, _) | (_, Err(e), _) | (_, _, Err(e)) => return Ok((keyspace_name, Err(e))), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔧 Please add a comment justifying this logic.
@@ -1507,9 +1566,42 @@ async fn query_tables_schema( | |||
return Ok::<_, QueryError>(()); | |||
} | |||
|
|||
let keyspace_udts = udts.get(&keyspace_name).unwrap_or(&empty_map); | |||
let keyspace_udts = match udts.get(&keyspace_name).unwrap_or(&empty_ok_map) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔧 I think a type annotation would aid readability here.
&None, | ||
host_filter.as_deref(), | ||
TabletsInfo::new(), | ||
&HashMap::new(), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏️ typo in commit Move handling of MissingUserDefinedType to ClusterData::new
: to from
-> from
Up until now
CqlType::UserDefinedType
contained aResult
, with error typeMissingUserDefinedType
.This error is an edge case that should be quite rare. It could possibly happen if we fetch metadata in the middle of schema change.
Possible scenario:
As this error should be extremely rare, it is okay to handle it a little bit less gracefully.
This simplifies user API (user no longer needs to handle the error) and makes progress towards unifying CqlType and ColumnType.
The new proposed way to handle this error is to drop the fetched metadata for the whole keyspace - re-using older version of it if available - and printing a warning.
As a side-fix, I found that
query_tables_schema
was unnecessarily called twice, so I deduplicated those calls.query_tables_schema
is likely the heaviest part of metadata reading, so this fix should lower the cluster load resulting from metadata fetches.Progresses towards: #691
Pre-review checklist
I added relevant tests for new features and bug fixes.I have provided docstrings for the public items that I want to introduce.I have adjusted the documentation in./docs/source/
.Fixes:
annotations to PR description.