Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rptest: add log_compaction_test.py #24187

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from

Conversation

WillemKauf
Copy link
Contributor

Serves as a stress test for compaction and tombstone removal.

Depends on changes made in redpanda-data/kgo-verifier#60, and so the commit SHA for ducktape-deps/kgo-verifier has been updated (CI will be red if this isn't merged yet)

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

@WillemKauf WillemKauf requested a review from a team as a code owner November 19, 2024 21:37
@WillemKauf WillemKauf requested review from clee and removed request for a team November 19, 2024 21:37
@WillemKauf WillemKauf marked this pull request as draft November 20, 2024 17:04
@WillemKauf WillemKauf force-pushed the tombstones_testing branch 2 times, most recently from def5f7d to fc193af Compare November 21, 2024 21:20
@WillemKauf WillemKauf marked this pull request as ready for review November 21, 2024 21:20
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Nov 22, 2024
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Nov 22, 2024
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Nov 22, 2024
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Nov 22, 2024
@redpanda-data redpanda-data deleted a comment from vbotbuildovich Nov 22, 2024
assert consumer.consumer_status.validator.tombstones_consumed > 0
assert consumer.consumer_status.validator.invalid_reads == 0

def part_two(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: part_one/part_two are not the most convenient names for future readers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed these functions.


# Sleep until the log has definitely been fully compacted.
timeout = self.extra_rp_conf['log_compaction_interval_ms'] / 1000 * 10
time.sleep(timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it doesn't guarantee that the log was compacted. It's better to read a metric to validate that compaction happened.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a number of metrics to the storage::probe that are now used in this test case, and will be of future use for compaction observability:

  • _segment_cleanly_compacted, the number of segments that have been cleanly compacted (i.e, had their keys de-duplicated with all previous segments before them to the front of the log).
  • _segments_marked_tombstone_free, the number of segments that have beenverified through the compaction process to be tombstone free.
  • _num_rounds_window_compaction, the number of rounds of sliding window compaction that have been driven to completion.

I have removed this sleep() in place of a deterministic check that compaction has settled using some of these metrics.

consumer.wait(timeout_sec=60)

partition_move_thread.join()
self._stop_stress_fibers()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why stress is needed only here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved stress fiber to run continuously throughout the test.

A metric that measures the number of segments that have been
cleanly compacted (i.e, had their keys de-duplicated with
all previous segments before them to the front of the log).
A metric that measures the number of segments that have been
verified through the compaction process to be tombstone free.
A metric that measures the number of rounds of sliding window compaction
that have been driven to completion.
See redpanda-data/kgo-verifier#60,
which added `--tombstone-probability`, `--compacted`,
`--validate-latest-values` as input parameters.
@vbotbuildovich
Copy link
Collaborator

the below tests from https://buildkite.com/redpanda/redpanda/builds/58820#019369b8-b74d-4c4f-9b4a-3da1ca75260f have failed and will be retried

storage_e2e_single_thread_rpunit

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/58843#01936ace-5008-4e64-98c4-5d82dd412623:

"rptest.tests.topic_recovery_test.TopicRecoveryTest.test_admin_api_recovery.cloud_storage_type=CloudStorageType.S3"

@vbotbuildovich
Copy link
Collaborator

Retry command for Build#58843

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_admin_api_recovery@{"cloud_storage_type":1}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants