-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[oximeter] Use stable hash, stable format, for deriving timeseries_key #4251
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -335,11 +335,13 @@ pub(crate) fn timeseries_key_for<'a>( | |
target_fields: impl Iterator<Item = &'a Field>, | ||
metric_fields: impl Iterator<Item = &'a Field>, | ||
) -> TimeseriesKey { | ||
use std::collections::hash_map::DefaultHasher; | ||
use highway::HighwayHasher; | ||
use std::hash::{Hash, Hasher}; | ||
let mut hasher = DefaultHasher::new(); | ||
let mut hasher = HighwayHasher::default(); | ||
for field in target_fields.chain(metric_fields) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we need to sort these before hashing. With the change in 8ffd476, these should be sorted when the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've added this in 2e91139 -- as expected, this does not change the hashes, since things are already sorted, but it does make it explicit. I opted to alter the interface rather than re-sorting internally. |
||
field.hash(&mut hasher); | ||
bcs::to_bytes(&field) | ||
.expect("Failed to serialized field to bytes") | ||
smklein marked this conversation as resolved.
Show resolved
Hide resolved
|
||
.hash(&mut hasher); | ||
} | ||
hasher.finish() | ||
} | ||
|
@@ -393,4 +395,90 @@ mod tests { | |
assert!(TimeseriesName::try_from("a:").is_err()); | ||
assert!(TimeseriesName::try_from("123").is_err()); | ||
} | ||
|
||
#[test] | ||
fn test_timeseries_key_generation_hashes_fields_sequentially() { | ||
use super::timeseries_key_for; | ||
use oximeter::{Field, FieldValue}; | ||
|
||
let f = |name: &str, value| Field { name: name.to_string(), value }; | ||
|
||
// Confirm that "targets" and "metrics" are interchangeable, | ||
// we just hash everything sequentially. | ||
assert_eq!( | ||
timeseries_key_for( | ||
[&f("a", FieldValue::String("a".to_string()))].into_iter(), | ||
[&f("b", FieldValue::String("b".to_string()))].into_iter(), | ||
), | ||
timeseries_key_for( | ||
[ | ||
&f("a", FieldValue::String("a".to_string())), | ||
&f("b", FieldValue::String("b".to_string())), | ||
] | ||
.into_iter(), | ||
[].into_iter(), | ||
), | ||
); | ||
|
||
// However, order still matters ("a, b" != "b, a") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this test does accurately reflect the original implementation, but I think we don't want order to matter. What matters is the unordered set of field names, types, and values. Permuting them should not change the key, IMO, because that's really the same timeseries. So if we sort the keys in the function above, we should then There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have removed this test -- "stability of a sample" is more important -- but yes, this is fixed. For what it's worth, we're sorting "the targets fields" and "the metric fields" separately, but I think we're on the same page here. |
||
assert_ne!( | ||
timeseries_key_for( | ||
[&f("a", FieldValue::String("a".to_string()))].into_iter(), | ||
[&f("b", FieldValue::String("b".to_string()))].into_iter(), | ||
), | ||
timeseries_key_for( | ||
[&f("b", FieldValue::String("b".to_string()))].into_iter(), | ||
[&f("a", FieldValue::String("a".to_string()))].into_iter(), | ||
), | ||
); | ||
} | ||
smklein marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
#[test] | ||
fn test_timeseries_key_stability() { | ||
use super::timeseries_key_for; | ||
use oximeter::{Field, FieldValue}; | ||
use strum::EnumCount; | ||
|
||
let values = [ | ||
("string", FieldValue::String(String::default())), | ||
("i8", FieldValue::I8(-0x0A)), | ||
("u8", FieldValue::U8(0x0A)), | ||
("i16", FieldValue::I16(-0x0ABC)), | ||
("u16", FieldValue::U16(0x0ABC)), | ||
("i32", FieldValue::I32(-0x0ABC_0000)), | ||
("u32", FieldValue::U32(0x0ABC_0000)), | ||
("i64", FieldValue::I64(-0x0ABC_0000_0000_0000)), | ||
("u64", FieldValue::U64(0x0ABC_0000_0000_0000)), | ||
( | ||
"ipaddr", | ||
FieldValue::IpAddr(std::net::IpAddr::V4( | ||
std::net::Ipv4Addr::LOCALHOST, | ||
)), | ||
), | ||
("uuid", FieldValue::Uuid(uuid::Uuid::nil())), | ||
("bool", FieldValue::Bool(true)), | ||
]; | ||
|
||
// Exhaustively testing enums is a bit tricky. Although it's easy to | ||
// check "all variants of an enum are matched", it harder to test "all | ||
// variants of an enum have been supplied". | ||
// | ||
// We use this as a proxy, confirming that each variant is represented | ||
// here for the purposes of tracking stability. | ||
assert_eq!(values.len(), FieldValue::COUNT); | ||
smklein marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
let mut output = vec![]; | ||
for (name, value) in values { | ||
let key = timeseries_key_for( | ||
[&Field { name: name.to_string(), value }].into_iter(), | ||
[].into_iter(), | ||
); | ||
output.push(format!("{name} -> {key}")); | ||
} | ||
|
||
expectorate::assert_contents( | ||
"test-output/timeseries-keys.txt", | ||
&output.join("\n"), | ||
); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
string -> 11027713821408113420 | ||
i8 -> 11040867383141339877 | ||
u8 -> 15607688254753473406 | ||
i16 -> 13295860386975871341 | ||
u16 -> 2145565374036862015 | ||
i32 -> 10931977742674674182 | ||
u32 -> 6138968585984979982 | ||
i64 -> 981603688282082647 | ||
u64 -> 4754979326506678930 | ||
ipaddr -> 17297273002898199682 | ||
uuid -> 9564253805848631232 | ||
bool -> 13392625023595096799 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So one other issue I've been ambivalent about is including the timeseries name in the key. This impl currently will generate the same key for instances of two different timeseries that happen to have the same fields. The database currently includes an ordering key on fields and measurement tables that looks like
(timeseries_name, timeseries_key, ...)
, so the key only needs to be unique within a timeseries. I'm basically wondering whether we want it globablly unique. Thoughts?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to disambiguate and I don't see much downside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other thing -- I think the hashing is currently based purely from the
Field
structs:omicron/oximeter/oximeter/src/types.rs
Lines 248 to 255 in 7e88bdf
However, the target and metric also have names, which could be distinct from the fields:
omicron/oximeter/oximeter/src/types.rs
Lines 644 to 649 in 7e88bdf
Should I be hashing this value too?
So in the case of:
The thing we have been hashing -- even before this PR -- are the
Field
values, which are:[ ("a", valueA), ("b", valueB), ("c", valueC), ("d", valueD)]
(NOTE: This means editing the "MyTarget" or "MyMetric" names will not change the hash, currently)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(EDIT: UPDATED BASED ON FEEDBACK BELOW)
WDYT about hashing the following for a sample, in-order: