Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce CqlTimeuuid type #894

Merged
merged 10 commits into from
Jan 26, 2024
79 changes: 79 additions & 0 deletions scylla/src/transport/cql_types_test.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ use crate::transport::session::IntoTypedRows;
use crate::transport::session::Session;
use crate::utils::test_utils::unique_keyspace_name;
use bigdecimal::BigDecimal;
use itertools::Itertools;
use num_bigint::BigInt;
use scylla_cql::frame::value::CqlTimeuuid;
use scylla_cql::types::serialize::value::SerializeCql;
Expand Down Expand Up @@ -1149,6 +1150,84 @@ async fn test_timeuuid() {
}
}

#[tokio::test]
async fn test_timeuuid_ordering() {
let session: Session = create_new_session_builder().build().await.unwrap();
let ks = unique_keyspace_name();

session
.query(
format!(
"CREATE KEYSPACE IF NOT EXISTS {} WITH REPLICATION = \
{{'class' : 'NetworkTopologyStrategy', 'replication_factor' : 1}}",
ks
),
&[],
)
.await
.unwrap();
session.use_keyspace(ks, false).await.unwrap();

session
.query(
"CREATE TABLE tab (p int, t timeuuid, PRIMARY KEY (p, t))",
(),
)
.await
.unwrap();

// Timeuuid values, sorted in the same order as Scylla/Cassandra sorts them.
let sorted_timeuuid_vals: Vec<CqlTimeuuid> = vec![
CqlTimeuuid::from_str("00000000-0000-1000-8080-808080808080").unwrap(),
CqlTimeuuid::from_str("00000000-0000-1000-ffff-ffffffffffff").unwrap(),
CqlTimeuuid::from_str("00000000-0000-1000-0000-000000000000").unwrap(),
CqlTimeuuid::from_str("fed35080-0efb-11ee-a1ca-00006490e9a4").unwrap(),
CqlTimeuuid::from_str("00000257-0efc-11ee-9547-00006490e9a6").unwrap(),
CqlTimeuuid::from_str("ffffffff-ffff-1fff-ffff-ffffffffffef").unwrap(),
CqlTimeuuid::from_str("ffffffff-ffff-1fff-ffff-ffffffffffff").unwrap(),
CqlTimeuuid::from_str("ffffffff-ffff-1fff-0000-000000000000").unwrap(),
CqlTimeuuid::from_str("ffffffff-ffff-1fff-7f7f-7f7f7f7f7f7f").unwrap(),
];
Comment on lines +1180 to +1190
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, if we test sorting, it would be good to have unsorted data. You can use https://docs.rs/rand/latest/rand/seq/trait.SliceRandom.html#tymethod.shuffle to avoid having 2 literals in the code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the test non-deterministic... If one of the 362880 possible orderings results in an incorrectly sorted sequence then you won't know which one failed. I think we should either use a fixed seed or use a random one but print it during the test so that we can reproduce the results.

I also think that we should test multiple orderings in one run with sort() - let's say several thousand times, so that this part of the test doesn't run longer than a couple milliseconds. Of course, it's unpractical to insert this data to ScyllaDB for each of the orderings, but I don't think it matters because we can assume that ScyllaDB's ordering works correctly - inserting this data in any order only once and then verifying that it matches our expected order should be enough.

Copy link
Collaborator

@piodul piodul Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also use a random generator with known algorithm - std::thread_rng uses an unspecified algorithm which can probably change in the future. We already have tests with fixed pseudorandomness - for example, see partitioners_output_same_result_no_matter_how_input_is_partitioned.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, 362880 is not a huge number. Perhaps checking all the permutations would not take much time (let's aim under 100ms)? We can iterate over permutations using Itertools::permutations().


// Generate all permutations.
let perms = Itertools::permutations(sorted_timeuuid_vals.iter(), sorted_timeuuid_vals.len())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Itertools is an extention trait for iterators. While what you wrote works, you are supposed to use it on the iterators directly:

let perms = sorted_timeuuid_vals
    .iter()
    .permutations(sorted_timeuuid_vals.len())
    .collect::<Vec<_>>();

.collect::<Vec<_>>();
// Ensure that all of the permutations were generated.
assert_eq!(362880, perms.len());
Comment on lines +1192 to +1196
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the perms are going to take a few tens of megabytes of memory: an allocation of 362880 * 3 * 8 = ~8.3MB of contiguous memory for the direct storage of perms alone (but may be up to twice as much due to vec exponential reallocation) and 362880 * 9 * 16 = ~50MB for the timeuuids alone (spread over 362880 individual allocations).

While this is not very much for today's standards, you need to take into account that many test cases can run in parallel, so this memory usage adds up - and it's a good practice in general to be mindful of those allocations.

In this case, it could have been easily and completely avoided by not collecting the permutations iterator but rather consuming it in a loop.

Again, not a huge problem and not blocking the merge in my opinion. Can be fixed in a followup.


// Verify that Scylla really sorts timeuuids as defined in sorted_timeuuid_vals
let prepared = session
.prepare("INSERT INTO tab (p, t) VALUES (0, ?)")
.await
.unwrap();
for timeuuid_val in &perms[0] {
session.execute(&prepared, (timeuuid_val,)).await.unwrap();
}

let scylla_order_timeuuids: Vec<CqlTimeuuid> = session
.query("SELECT t FROM tab WHERE p = 0", ())
.await
.unwrap()
.rows_typed::<(CqlTimeuuid,)>()
.unwrap()
.map(|r| r.unwrap().0)
.collect();

assert_eq!(sorted_timeuuid_vals, scylla_order_timeuuids);

for perm in perms {
// Test if rust timeuuid values are sorted in the same way as in Scylla
let mut rust_sorted_timeuuids: Vec<CqlTimeuuid> = perm
.clone()
.into_iter()
.map(|x| x.to_owned())
.collect::<Vec<_>>();
rust_sorted_timeuuids.sort();

assert_eq!(sorted_timeuuid_vals, rust_sorted_timeuuids);
}
}

#[tokio::test]
async fn test_inet() {
let session: Session = init_test("inet_tests", "inet").await;
Expand Down