Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rptest: add log_compaction_test.py #24187

Merged
merged 9 commits into from
Dec 9, 2024

Conversation

WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Nov 19, 2024

Serves as a stress test for compaction and tombstone removal.

Depends on changes made in redpanda-data/kgo-verifier#60, and so the commit SHA for ducktape-deps/kgo-verifier has been updated (CI will be red if this isn't merged yet)

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Bug Fixes

  • Fixes a bug in which segments which may have tombstones in them were not considered eligible for self-compaction

Improvements

  • Adds a number of new metrics to improve observability of the compaction subsystem:
    • Added the _segment_cleanly_compacted metric to segment::probe.
    • Added the _segments_marked_tombstone_free metric to segment::probe.
    • Added the _num_rounds_window_compaction metric to segment::probe.

@WillemKauf WillemKauf requested a review from andrwng November 19, 2024 21:37
@WillemKauf WillemKauf requested a review from a team as a code owner November 19, 2024 21:37
@WillemKauf WillemKauf requested review from clee and removed request for a team November 19, 2024 21:37
@WillemKauf WillemKauf marked this pull request as draft November 20, 2024 17:04
@WillemKauf WillemKauf force-pushed the tombstones_testing branch 2 times, most recently from def5f7d to fc193af Compare November 21, 2024 21:20
@WillemKauf WillemKauf marked this pull request as ready for review November 21, 2024 21:20
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Nov 22, 2024
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Nov 22, 2024
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Nov 22, 2024
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Nov 22, 2024
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Nov 22, 2024
tests/rptest/tests/log_compaction_test.py Outdated Show resolved Hide resolved
tests/rptest/tests/log_compaction_test.py Outdated Show resolved Hide resolved
tests/rptest/tests/log_compaction_test.py Outdated Show resolved Hide resolved
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Nov 26, 2024

the below tests from https://buildkite.com/redpanda/redpanda/builds/58820#019369b8-b74d-4c4f-9b4a-3da1ca75260f have failed and will be retried

storage_e2e_single_thread_rpunit

the below tests from https://buildkite.com/redpanda/redpanda/builds/59112#01938800-b170-476c-89c6-a08ca0f89f11 have failed and will be retried

kafka_server_rpfixture

the below tests from https://buildkite.com/redpanda/redpanda/builds/59290#019397a1-424a-44d4-8ee9-8d90c6545947 have failed and will be retried

gtest_raft_rpunit

the below tests from https://buildkite.com/redpanda/redpanda/builds/59376#01939c49-56cd-4534-8cca-6e8c29cafbad have failed and will be retried

partition_balancer_simulator_test_rpunit

@dotnwat
Copy link
Member

dotnwat commented Dec 6, 2024

   assert consumer.consumer_status.validator.tombstones_consumed == 0
AssertionError

eek!

@WillemKauf
Copy link
Contributor Author

WillemKauf commented Dec 6, 2024

DEBUG 2024-12-06 00:12:13,392 [shard 0:lc ] storage-gc - disk_log_impl.cc:1312 - Compaction of {kafka/tapioca/0} stopped because of shutdown

DEBUG 2024-12-06 00:12:49,334 [shard 0:lc ] storage-gc - segment_utils.cc:749 - detected /var/lib/redpanda/data/kafka/tapioca/0_22/9228-1-v1.compaction_index is already compacted

TRACE 2024-12-06 00:12:53,410 [shard 0:main] storage - segment.cc:92 - closing segment: {reader={/var/lib/redpanda/data/kafka/tapioca/0_22/9228-1-v1.log, (470102 bytes)}, clean_compact_timestamp:{{timestamp: 1733443928831}}, may_have_tombstone_records:1}

This is a segment closed with a clean_compact_timestamp marked but with may_have_tombstone_records == 1.

Eek indeed. I believe this is a bug.

This if statement here should check if the segment may_have_tombstone_records and proceed with self compacting it, even if the compacted_index::recovery_state is already_compacted

@WillemKauf
Copy link
Contributor Author

1500 CI runs pays off - added commit to deal with the above.

470fdc3

@dotnwat
Copy link
Member

dotnwat commented Dec 6, 2024

This if statement here should check if the segment may_have_tombstone_records and proceed with self compacting it, even if the compacted_index::recovery_state is already_compacted

Interesting. Is this a case where we would have been better off not falling back to old style compaction after sliding window (is that's whats happening there with self compact)?

@WillemKauf
Copy link
Contributor Author

/ci-repeat 5
release
skip-units
dt-repeat=100
tests/rptest/tests/log_compaction_test.py

@redpanda-data redpanda-data deleted a comment from vbotbuildovich Dec 6, 2024
Segments that may have tombstone records in them should still be
considered eligible for self compaction.

An early return statement was missing a check for this condition.
Add it so that tombstones will be properly removed for a segment
eligible for removal.

Also adjusts a `vassert()` to account for this case.
A metric that measures the number of segments that have been
cleanly compacted (i.e, had their keys de-duplicated with
all previous segments before them to the front of the log).
A metric that measures the number of segments that have been
verified through the compaction process to be tombstone free.
A metric that measures the number of rounds of sliding window compaction
that have been driven to completion.
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted`,
`--validate-latest-values` as input parameters.
Also add an error message to the `wait_until()` call.
@WillemKauf
Copy link
Contributor Author

(oops. only half of my commit got added last night. that's what I get for working late)

Interesting. Is this a case where we would have been better off not falling back to old style compaction after sliding window (is that's whats happening there with self compact)?

No, there's only two places where tombstones are removed, during self compaction de-duplication and sliding window deduplication.

Therefore the tombstones for this case would have been removed if more sliding window rounds took place in the future, but since we produce and then stop with this test, it's expected that self compaction is going to remove the rest of the tombstones before the consumer sees them (which wouldn't happen before this fix)

@vbotbuildovich
Copy link
Collaborator

Retry command for Build#59376

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/availability_test.py::AvailabilityTests.test_recovery_after_catastrophic_failure
tests/rptest/tests/topic_creation_test.py::CreateTopicReplicaDistributionTest.test_topic_aware_distribution

@WillemKauf
Copy link
Contributor Author

/ci-repeat 5
release
skip-units
dt-repeat=100
tests/rptest/tests/log_compaction_test.py

@dotnwat
Copy link
Member

dotnwat commented Dec 6, 2024

Retry command for Build#59376

Where all the failures for other tests?

@WillemKauf
Copy link
Contributor Author

/ci-repeat 5
release
skip-units
skip-redpanda-build
dt-repeat=100
tests/rptest/tests/log_compaction_test.py

@WillemKauf
Copy link
Contributor Author

Looking good

@WillemKauf WillemKauf requested a review from dotnwat December 9, 2024 14:07
@WillemKauf WillemKauf merged commit 5a38373 into redpanda-data:dev Dec 9, 2024
17 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v24.3.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v24.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-24187-v24.3.x-979 remotes/upstream/v24.3.x
git cherry-pick -x 1f5f3633ee 5b50f25e9b 8d66388ecf 87d442bf2a 85bea2b00b 15ab6a8557 ac273cc321 985beffa20 d238b1c9b6

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants