-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interval log support #70
Conversation
Start work on writing logs.
Added a sample file from the Java impl. Added max value divisor.
I'm confused. Why do we now have our own Will try to look over this at some point in not too long, but flying to a conference in Shanghai tomorrow morning, so may take a little while :) |
The No big rush on reviewing it; I need some time to ponder the above timestamp issues anyway. |
I want to use that in interval log benchmarks, hence tackling that TODO. For serialization benchmarks, this is just right: we want to exercise all different varint lengths (modulo bucketing, which throws it off a little bit). For non-serialization benches (e.g. add and subtract) this is still a lot better than just regular random numbers, which will be heavily weighted towards the last few buckets. This drops measurements quite a bit, but that's expected because now histograms aren't mostly using values way in the high end that all get bucketed together, so now there's a lot more work to do when traversing the histogram.
It's fast enough that I'm not stressing about optimizing it further.
I wrote up my interpretation of how the Java impl treats StartTime and BaseTime. I now have this wishlist for interval log v2:
Spur of the moment strawman syntax:
Not sure what to do about metadata for an interval. Maybe re-use url query param encoding? But almost everyone gets that wrong, so perhaps it's best to steer clear of that mess. |
Oh, and also: Fixes #59. |
/// will be represented by the iterator providing the corresponding variants here. The presence | ||
/// of those timestamps will affect how you should interpret the timestamps for individual | ||
/// intervals. | ||
#[allow(variant_size_differences)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be worried about a 56-byte enum variant?
|
||
/// Write a StartTime log line. See the module-level documentation for more info. | ||
/// | ||
/// This should only be called once to avoid creating a confusing interval log. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should be enforced by panicking if it's called twice. I couldn't think of a way to enforce it with the compiler without it being rather clumsy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this some more, IntervalLogHeaderWriter
really just seems like a builder to me. Maybe that's the way we should present it. That would allow us to only write things out when the user has committed to them, and then only write them once. Something like:
let mut ilb = IntervalLogBuilder::default().with_start_time(42);
ilb.add_comment("I love comments");
let mut logger = ilb.build(&mut writer, &mut serializer);
This has some other neat secondary effects such as headers being easy to re-use, fields that can only be set once only being written once, and more efficient writes (because they can be batch-written on .build()
. We can bikeshed the exact name for .build()
, but overall following the builder pattern as outlined in the Rust API guidelines would be a good route to take.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the builder approach.
/// Add a comment line. | ||
/// | ||
/// If you do silly things like write a comment with a newline character, you'll end up with | ||
/// an un-parseable log file. Don't do that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe panic? Or filter silently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or .replace('\n', "\n#")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, maybe that would be better to protect users from accidentally horking their logs. I suppose it should probably look for all of https://en.wikipedia.org/wiki/Newline#Unicode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want char::is_control
, which includes all the Unicode control characters (including newlines).
/// The timestamp may be absolute vs the epoch, or there may be a `StartTime` or `BaseTime` for | ||
/// the log, in which case you may wish to consider this number as a delta vs those timestamps. | ||
/// See the module-level documentation about timestamps. | ||
pub fn start_timestamp(&self) -> f64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add a method to return a chrono
NaiveDateTime
, but it didn't seem worth the dependency to do something that the consumer can just as easily do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, given that it might be entirely user-defined, I'm not sure we should.
/// | ||
/// `tag` is an optional tag for this histogram. | ||
/// | ||
/// `max_value_divisor` is used to scale down the max value to something that may be more human |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be fixed for the entire interval log? It seems like it can't be a good idea to allow different divisors for different intervals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that sounds like a good idea. And it fits nicely with the proposed builder pattern!
//! log.extend_from_slice(b"3.789,1.007,2.769,base64EncodedHisto\n"); | ||
//! log.extend_from_slice(b"Tag=b,4.123,1.007,2.769,base64EncodedHisto\n"); | ||
//! log.extend_from_slice(b"5.456,1.007,2.769,base64EncodedHisto\n"); | ||
//! log.extend_from_slice(b"#Another comment\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be more readable with multi-line strings:
let mut log = "\
#I'm a comment\n\
Tag=a,0.123,1.007,2.769,base64EncodedHisto\n\
1.456,1.007,2.769,base64EncodedHisto\n\
3.789,1.007,2.769,base64EncodedHisto\n\
Tag=b,4.123,1.007,2.769,base64EncodedHisto\n\
5.456,1.007,2.769,base64EncodedHisto\n\
#Another comment\n"
.to_vec();
//! let count = iter.map(|r| r.unwrap()) | ||
//! // only look at intervals (which are the only non-comment lines in this log) | ||
//! .filter_map(|e| match e { | ||
//! interval_log::LogEntry::Interval(ilh) => Some(ilh), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would match on Ok(interval_log::LogEntry::Interval(ilh))
here to avoid the unwrap
just above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without having read further down, I also don't see why the IntervalLogIterator
yields a Result
. What is the Err
case? And wouldn't it be easier to use with the errors being another LogEntry
type? Or does the iterator behave weirdly once an Err
has been yielded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only error (at the moment) is a parsing error, beyond which we can't really proceed. We could have errors represented in LogEntry
, though that feels somewhat odd to me for reasons I can't really pin down. Not in love with the current way, though, that's for sure... an iterator of Result
s is clumsy. One good thing about Result
is that it enables easy use of ?
, which I don't think we would get from having LogEntry
have an error variant.
Speaking of which, the iterator should probably only return None
once it's returned a Some(Err(...))
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with all the above.
//! header_writer.write_comment("Comments are great").unwrap(); | ||
//! header_writer.write_start_time(123.456789).unwrap(); | ||
//! | ||
//! let mut log_writer = header_writer.into_log_writer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't read further down yet, but is an IntervalLogHeaderWriter
the only way to get an IntervalLogWriter
? If so, I think the former should be given a different name.
internal_writer: InternalLogWriter<'a, 'b, W, S>, | ||
} | ||
|
||
impl<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> IntervalLogHeaderWriter<'a, 'b, W, S> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the IntervalLogHeaderWriter
should be parameterized on S
. Instead, the call to into_log_writer
should be. The header writes are all independent of the Serializer
as far as I can tell?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, it doesn't use Serializer
, so it can definitely be deferred. I was torn about whether to take the Serializer
up front or wait until it was needed, and if it feels weird to you to do it eagerly, that's enough of a reason to switch for me.
|
||
/// Once you're finished with headers, convert this into a log writer so you can write interval | ||
/// histograms. | ||
pub fn into_log_writer(self) -> IntervalLogWriter<'a, 'b, W, S> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See notes above about making this more builder-like, and then only committing to a writer + serializer at this point (rather than when creating the header writer).
/// | ||
/// This isn't created directly; start with an `IntervalLogHeaderWriter`. Once you've written the | ||
/// headers and ended up with an `IntervalLogWriter`, typical usage would be to write a histogram | ||
/// at regular intervals (e.g. once a second). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an example would be more useful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll add one once we get the builder API sorted out.
/// instead use a delta since that time. See the discussion about timestamps in the module-level | ||
/// documentation. | ||
/// | ||
/// `duration` is the duration of the interval in seconds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason you didn't make this a time::Duration
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I didn't think of it once I dismissed the idea of bringing in chrono
;)
self.text_buf.clear(); | ||
let _len = self.serializer | ||
.serialize(h, &mut self.serialize_buf) | ||
.map_err(|e| IntervalLogWriterError::SerializeError(e))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add
impl<E> From<E> for IntervalLogWriterError<E> {}
so that you can remove this map_err
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... without a way to bound E
, that would map every possible error type to SerializeError
, which seems undesirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But isn't that what the IntervalLogWriterError
type already suggests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, there is the IoError
variant, so it's not just a wrapper for serialization errors. Having everything else map to SerializeError(E)
feels risky; wouldn't that make it possible to have a typo where some other Result
-producing call gets a ?
appended to it, and presto now we're returning SerializeError(SomeOtherUnrelatedError)
.
io::Error
seems like a reasonable thing for From
because it's called many times and is unambiguous. I've never used an open-ended From
like that so maybe my unease is unfounded, but it reminds me unpleasantly of Scala's implicits which made code hard to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that IntervalLogWriterError
's one type argument is only used in the SerializeError
variant. If we were to add some other wrapped error, that would be of a different type, which would change IntervalLogWriterError
to take two type arguments <E1, E2>
. In that case, an impl
of From
wouldn't make much sense without further bounds. But in the current case, if you have an error and want to produce some IntervalLogWriterError<E>
, then that works exactly when that error is of type E
. Which is what the above From
would work for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tightened up the bounds a bit:
pub enum IntervalLogWriterError<S: Serializer> {
/// Histogram serialization failed.
SerializeError(S::SerializeError),
/// An i/o error occurred.
IoError(io::ErrorKind),
}
Nonetheless, the compiler gets grumpy:
error[E0119]: conflicting implementations of trait
std::convert::From<serialization::interval_log::IntervalLogWriterError<_>>
for type `serialization::interval_log::IntervalLogWriterError<_>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What code is it it's complaining on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've extracted it into this playground: https://play.rust-lang.org/?gist=d6826495d99f9718c0980197a0fe14d3&version=stable
I think what the compiler is getting at is that you could have a Serializer
whose SerializeError
was io::Error
, and then there would be no way to know which impl
of From
to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, that makes sense. Oh well, leave it as-is I guess.
} | ||
} | ||
|
||
impl<'a> ops::Deref for Tag<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to impl
AsRef
instead here. Though not clear that it matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not real clear on when I should use one vs the other, but String
impl's Deref<Target=str>
so I was cargo-culting off of that since this seems like a similar sort of relationship. Since this is just a convenience for hypothetical usage, it's hard for me to say what would be best, but I'm happy to switch if AsRef
is the Right Answer(TM).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is really poorly documented. Reading a bit more though, I think implementing Deref
might actually be the right thing after all, so leave it as-is.
pub struct IntervalLogHistogram<'a> { | ||
tag: Option<Tag<'a>>, | ||
start_timestamp: f64, | ||
duration: f64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this should be exposed as time::Duration
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make write_histogram
use time::Duration
too.
|
||
/// Errors that occur when parsing an interval log. | ||
#[derive(Debug, PartialEq)] | ||
pub enum LogIteratorError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unfortunate, but makes total sense. There's some related discussion here: https://users.rust-lang.org/t/handling-errors-from-iterators/2551/20. I guess it's fine.
/// | ||
/// This parses from a slice representing the complete file because it made implementation easier | ||
/// (and also supports mmap'd files for maximum parsing speed). If parsing from a `Read` is | ||
/// important for your use case, open an issue about it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting point. I suspect parsing from a Read
will be a very useful addition, and one that will be tricky to combine with the way the code is currently written, but I agree that we can deal with that on-demand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it will be too bad; we'll just have to read into a buffer, parse what we can, and grow the buffer if we get Incomplete
. There's some streaming support in nom
, but I didn't investigate too closely. More than I felt like tackling on top of learning nom
. :)
} | ||
|
||
impl<'a> Iterator for IntervalLogIterator<'a> { | ||
type Item = Result<LogEntry<'a>, LogIteratorError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some issues with this.
If I've understood the log format correctly, this iterator will always first yield at most one each of StartTime
and BaseTime
, and from that point forwards, only yield LogEntry::Interval
. If that interpretation is correct, then it seems like this should really be an iterator that yields IntervalLogHistogram<'a>
, with some kind of initial step that gives you the start/base time. That'd be much more convenient to work with, as it would obviate the need for that pesky match
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an ideal world, that would be true: well defined headers, which we read once, and then only expose body elements (ala HTTP). However, I'm not sure that in the wild these logs actually are structured that way. The Java impl lets you write those header-ish comments anywhere you want as often as you want, and supports reading them interleaved with interval logs as well, so maybe that's a thing people actually do? I asked Gil but I haven't heard from him in a while on Gitter; he must be busy with other stuff.
I figure that for users who (1) have well-structured logs and (2) are bothered by this API it would be quick work to wrap this Iterator of Results into a much tidier thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I suppose... It seems unfortunate, because it makes the API a fair amount less ergonomic. I guess we could provide a refined iterator on top of this, whereas the reverse is not true, so leave it like this for now.
@@ -137,7 +144,7 @@ | |||
//! // Make some histograms | |||
//! for _ in 0..num_histograms { | |||
//! let mut h = Histogram::<u64>::new_with_bounds(1, u64::max_value(), 3).unwrap(); | |||
//! h.record_n(42, 7); | |||
//! h.record_n(42, 7).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really want rust-lang/rfcs#1937 so we can make this use ?
.
@@ -0,0 +1,126 @@ | |||
extern crate rand; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module is only used in tests, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. I'll give it a comment to make it less confusing.
I like the change overall. See some suggestions for structural/interface changes inline above. |
- Doc test tidying - Use `time::Duration` for interval duration
tests/interval_log.rs
Outdated
@@ -250,10 +268,11 @@ mod tests { | |||
} | |||
} | |||
|
|||
println!("{}", ::std::str::from_utf8(&log_buf).unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is just leftover from debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh!
/// will be represented by the iterator providing the corresponding variants here. The presence | ||
/// of those timestamps will affect how you should interpret the timestamps for individual | ||
/// intervals. See the module-level documentation. | ||
#[allow(variant_size_differences)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we care about this? Not really sure I see a way around it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought variant_size_differences
was allowed by default? I think it's fine for this enum
to have a somewhat large size variant, though it is also another reason to make the iterator only yield IntervalLogHistogram
(as the enum would no longer be necessary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are, but they're turned off in the crate-wide deny
statement. The variant is only large because IntervalLogHistogram
is large, though, so removing the enum isn't really going to help the "copying fat structs around" issue (though I think it's unlikely to be a performance issue).
)) | ||
/// This can be called multiple times, but only the value for the most recent invocation will | ||
/// be written. | ||
pub fn with_start_time(&mut self, seconds_since_epoch: f64) -> &mut Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if these should take time::SystemTime
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... perhaps. That would mean we'd have to do something about handling the error case for duration_since()
, though. We could just take the Duration
instead of the SystemTime
to push that decision off to the caller, but still have something more "time-flavored" than a lowly f64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let since_epoch = match time.duration_since(UNIX_EPOCH) {
Ok(d) => dur_to_f64(d),
Err(ste) => -1 * dur_to_f64(ste.duration())
};
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that should work, and we're very unlikely to hit the Err case, though I wonder if maybe we should panic because your timekeeping is clearly hosed if your system thinks it's before 1970?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh -- if we accept SystemTime
, we can't have repeatable tests because you can't construct an arbitrary SystemTime
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you can, by adding to UNIX_EPOCH
;)
As for panicking, I'd rather avoid it given that we don't have to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. In that case I'm 👍 because I can get my testing done and we can always provide another way to set the times if SystemTime
turns out to be awkward for users.
} | ||
|
||
/// Build a LogWriter and apply any configured headers. | ||
pub fn build_with<'a, 'b, W: 'a + io::Write, S: 'b + Serializer>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty much the opposite of what https://aturon.github.io/ownership/builders.html suggests: we're taking the required things at the end, not the beginning, but being able to use the same setup on multiple writers or serializers seems hypothetically useful, and this API does not offend me otherwise.
Let the name bikeshedding begin! :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, I'm okay with this approach.
I actually don't mind build_with
. I think I'd be partial to a name that fits better in context though. How about begin_log_with
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, WFM.
|
||
// The Java impl doesn't write a comment for this but it's confusing to silently modify the | ||
// max value without leaving a trace | ||
if self.max_value_divisor != 1.0_f64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Straying off the reservation here but it is so easy to misinterpret a log (as a human) without this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it.
max_value_divisor: self.max_value_divisor, | ||
}; | ||
|
||
for c in self.comments.iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, you can use for c in &self.comments
here.
/// Add a comment line to be written when the writer is built. | ||
/// | ||
/// If you do silly things like write a comment with a newline character, you'll end up with | ||
/// an un-parseable log file. Don't do that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to make sure that #70 (comment) doesn't get lost in these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, got it tracked in the opening comment on the PR. This is one of the things I prefer about BitBucket: PR tasks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tightened up parsing to only accept newlines as a line ending (because \r
should be shunned), and parsing, which was already quite fast, sped up over 3x!
Anyway, it seems to work fine if we only transform \n
into newline-plus-# and leave everything else as-is. Is there a particular reason to turn any control character into a subsequent line of comments?
pub fn write_histogram<T: Counter>( | ||
&mut self, | ||
h: &Histogram<T>, | ||
start_timestamp: f64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should maybe be a Duration
as well? 🤔 Conceivably, that's how it would be used: a Duration
since either UNIX_EPOCH
or whatever the chosen StartTime
/BaseTime
was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, that's a good idea!
#[allow(variant_size_differences)] | ||
pub enum LogEntry<'a> { | ||
/// Logs may include a StartTime. If present, it represents seconds since the epoch. | ||
StartTime(f64), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I suppose this should be Duration
then too. More expensive to parse into Duration
but these should be rare relative to histograms, so it's probably not an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it back, that should be SystemTime
I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, now I'm not so sure. There's not really much useful you can do with a SystemTime
except turn it back into a Duration
based on UNIX_EPOCH
, which would force the user to do another round of error checking on the Result
. So, it seems probably more desirable to give them the Duration
directly. The path from Duration
to chrono::naive::NaiveDateTime
and friends is then pretty smooth.
@@ -365,7 +366,11 @@ impl<'a, 'b, W: 'a + io::Write, S: 'b + Serializer> InternalLogWriter<'a, 'b, W, | |||
} | |||
|
|||
fn write_comment(&mut self, s: &str) -> io::Result<()> { | |||
write!(self.writer, "#{}\n", s) | |||
for l in s.split('\n') { | |||
write!(self.writer, "#{}\n", l)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you not still have to escape control characters? \r
for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not unless we start treating \r
as a line separator. I added a test that has every control character in a comment. It gets a new line with a leading #
for the \n
but all others are left alone, and it parses fine.
Parsing into an f64 then separately truncating to an int and multiplying the fractional part by 1_000_000_000 introduced errors in the fractional part since frequently abc.xyz cannot produce 0.xyz as an f64, but has to have 0.xyz0000001 or something like that.
IResult::Done(rest, data) => { | ||
let (secs, nanos_str) = data; | ||
|
||
// only read up to 9 digits since we can only support nanos, not smaller precision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"only" heh
cf9ec24 fixes the last task "Better type for reading StartTime/BaseTime", right? |
Yep I think this is good to go! |
Nice! Good work. |
Mostly done; figured I'd get feedback on what I had so far.
The log format has some issues:
Reading logs
This exposes a somewhat low level interface: it doesn't keep track of an observed
BaseTime
and add it to interval histograms' timestamps, for instance. Instead, it exposes anIterator
of the useful log lines, from which it would be simple to apply your own preference for dealing with timestamps, skipping logs you don't care about, etc. I decided to try usingnom
for parsing, and I think it came out pretty well.Writing logs
Pretty straightforward. I chose to make a
Tag
abstraction so that we could guarantee the lack of tag-forbidden characters in astr
with the type system. I also split writing duties into two separate types to prevent users from writing lines I think should only exist in the preamble (StartTime
andBaseTime
) once they've started writing interval histograms. (Helper methods for those things aren't there yet but the structure is in place.)To do
StartTime
andBaseTime
lines. There's no support for writing them yet mainly because I can't figure out what their semantics are in a clear enough way to document it in a way that isn't terribly confusing. They seem to sort of clash with each other.