Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-7961] rptest: Add ducktape test for partition movement in RRR cluster #24159

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

Lazin
Copy link
Contributor

@Lazin Lazin commented Nov 18, 2024

Add new ducktape test. The test creates two clusters (source and RRR), produces data to the source and consumes from the RRR. The test initiates partition movement for every partition of the RRR cluster.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

@Lazin Lazin requested a review from andrwng November 18, 2024 15:33
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Nov 18, 2024

the below tests from https://buildkite.com/redpanda/redpanda/builds/58193#01933fe3-0d52-4e33-9640-662e140cfed0 have failed and will be retried

gtest_raft_rpunit

the below tests from https://buildkite.com/redpanda/redpanda/builds/58697#019363ec-d093-4aea-b1e7-a31e5d1ea5f6 have failed and will be retried

partition_balancer_simulator_test_rpunit

@vbotbuildovich
Copy link
Collaborator

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/58193#01934040-98ba-4bc6-81b0-58fdd6bca7ce:

"rptest.tests.e2e_iam_role_test.AWSRoleFetchTests.test_write"

@vbotbuildovich
Copy link
Collaborator

Retry command for Build#58193

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/e2e_iam_role_test.py::AWSRoleFetchTests.test_write

Comment on lines +509 to +524
for part_id in range(0, partition_count):
assignments = self._get_node_assignments(admin, self.topic_name,
part_id)
self.logger.info(
f"initial assignments for {self.topic_name}/{part_id}: {assignments}"
)
replicas = set([r['node_id'] for r in assignments])
for b in brokers:
if b['node_id'] not in replicas:
assignments[0] = {"node_id": b['node_id']}
break
self.logger.info(
f"new assignments for {self.topic_name}/{part_id}: {assignments}"
)
self._set_partition_assignments(self.topic_name, part_id,
assignments, admin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the goal to run partition movement concurrently with data arriving into the destination cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary, just to make sure that fast partition movement is not broken for RRR
we have a gap in our testing here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we verify that fast partition movement was used?

@Lazin
Copy link
Contributor Author

Lazin commented Nov 25, 2024

/ci-repeat 1
tests/rptest/tests/e2e_iam_role_test.py::AWSRoleFetchTests.test_write

@Lazin Lazin requested a review from dotnwat November 25, 2024 15:20
self.start_consumer()

# Wait until reconfiguration are completed and start consuming again
wait_until(lambda: len(admin.list_reconfigurations()) == 0, 30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add an informative error message parameter to this wait_until

Comment on lines +527 to +528
wait_until(
lambda: len(admin.list_reconfigurations()) == partition_count, 30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add an informative error message parameter to this wait_until


# Wait until reconfiguration are completed and start consuming again
wait_until(lambda: len(admin.list_reconfigurations()) == 0, 30)
self.start_consumer()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we start the consumer and then immediately end the test?

Comment on lines +509 to +524
for part_id in range(0, partition_count):
assignments = self._get_node_assignments(admin, self.topic_name,
part_id)
self.logger.info(
f"initial assignments for {self.topic_name}/{part_id}: {assignments}"
)
replicas = set([r['node_id'] for r in assignments])
for b in brokers:
if b['node_id'] not in replicas:
assignments[0] = {"node_id": b['node_id']}
break
self.logger.info(
f"new assignments for {self.topic_name}/{part_id}: {assignments}"
)
self._set_partition_assignments(self.topic_name, part_id,
assignments, admin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we verify that fast partition movement was used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants