Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: add log lines about removed data #24659

Merged
merged 2 commits into from
Jan 8, 2025

Conversation

andrwng
Copy link
Contributor

@andrwng andrwng commented Dec 25, 2024

It's very difficult to retroactively debug why records are removed by compaction. This adds some logging about removed data.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Improvements

  • Adds logging to mention data removed by compaction.

This adds some accounting to the copy_data_segment_reducer to keep track
of records and batches that are removed. The goal here is ultimately to
provide more visibility into removed records, though this accounting
isn't yet plugged into anything.

A later change will expose numbers (likely with logging).
Removing records from the storage layer is currently very silent, making
it difficult to debug. This commit exposes the new
copy_data_segment_reducer stats in as a log message, at info level when
data is removed, and debug level when not (presumably we mostly care
about these stats when data is actually being removed).
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Dec 25, 2024

CI test results

test results on build#60138
test_id test_kind job_url test_status passed
rptest.tests.datalake.partition_movement_test.PartitionMovementTest.test_cross_core_movements.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/60138#0193fbc0-165c-46d3-8d19-3316953f53ef FLAKY 3/6
test results on build#60371
test_id test_kind job_url test_status passed
rm_stm_tests_rpunit.rm_stm_tests_rpunit unit https://buildkite.com/redpanda/redpanda/builds/60371#0194426f-9a77-4b0c-9ac5-2cef1d02af04 FLAKY 1/2
rm_stm_tests_rpunit.rm_stm_tests_rpunit unit https://buildkite.com/redpanda/redpanda/builds/60371#0194426f-9a78-4c03-889c-5960f45a0b1f FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/60371#019442b2-6b37-4c14-809f-4fa476fe7637 FLAKY 5/6
rptest.tests.partition_reassignments_test.PartitionReassignmentsTest.test_reassignments_kafka_cli ducktape https://buildkite.com/redpanda/redpanda/builds/60371#019442cc-edc4-47e7-bea7-3f1dd5aa955b FLAKY 3/6

const auto& stats = res.reducer_stats;
if (stats.has_removed_data()) {
vlog(
gclog.info,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong feelings that this shouldn't be at the info level, but I can imagine that it would make compaction output for a log with a high number of segments much noisier.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely more noisy, but we already put up with an info level message when we create a segment, so maybe not too bad. are there situations where we re-compact a segment many times, or is it usually <= 2x?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I'm trying to be cognizant about noise in making this conditionally a debug log.

It's certainly possible we'll compact a segment multiple times (it's not hard to imagine with high cardinality segments). But generally I'd argue that despite with the noise, if we're removing data, the log line is worth it. There was one recent instance where tracing back a record's removal was impossible because of a lack of observability in this area.

@andrwng
Copy link
Contributor Author

andrwng commented Jan 7, 2025

/ci-repeat

@andrwng andrwng requested review from WillemKauf and dotnwat January 8, 2025 00:53
@andrwng andrwng merged commit 0c0fd95 into redpanda-data:dev Jan 8, 2025
18 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v24.2.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v24.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-24659-v24.2.x-347 remotes/upstream/v24.2.x
git cherry-pick -x 33c3aa492e 8c84ffdabf

Workflow run logs.

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v24.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-24659-v24.3.x-438 remotes/upstream/v24.3.x
git cherry-pick -x 33c3aa492e 8c84ffdabf

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants