-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: add log lines about removed data #24659
Conversation
This adds some accounting to the copy_data_segment_reducer to keep track of records and batches that are removed. The goal here is ultimately to provide more visibility into removed records, though this accounting isn't yet plugged into anything. A later change will expose numbers (likely with logging).
Removing records from the storage layer is currently very silent, making it difficult to debug. This commit exposes the new copy_data_segment_reducer stats in as a log message, at info level when data is removed, and debug level when not (presumably we mostly care about these stats when data is actually being removed).
CI test resultstest results on build#60138
test results on build#60371
|
const auto& stats = res.reducer_stats; | ||
if (stats.has_removed_data()) { | ||
vlog( | ||
gclog.info, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong feelings that this shouldn't be at the info level, but I can imagine that it would make compaction output for a log with a high number of segments much noisier.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely more noisy, but we already put up with an info level message when we create a segment, so maybe not too bad. are there situations where we re-compact a segment many times, or is it usually <= 2x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I'm trying to be cognizant about noise in making this conditionally a debug log.
It's certainly possible we'll compact a segment multiple times (it's not hard to imagine with high cardinality segments). But generally I'd argue that despite with the noise, if we're removing data, the log line is worth it. There was one recent instance where tracing back a record's removal was impossible because of a lack of observability in this area.
/ci-repeat |
/backport v24.3.x |
/backport v24.2.x |
Failed to create a backport PR to v24.2.x branch. I tried:
|
Failed to create a backport PR to v24.3.x branch. I tried:
|
It's very difficult to retroactively debug why records are removed by compaction. This adds some logging about removed data.
Backports Required
Release Notes
Improvements