Skip to content

KAFKA-19481: Fix flaky test testConsumerGroupHeartbeatWithRegex #20298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

jmmonte2
Copy link

@jmmonte2 jmmonte2 commented Aug 4, 2025

Description

This change fixes the flaky test testConsumerGroupHeartbeatWithRegex, which fails with the following log. Develocity link
org.opentest4j.AssertionFailedError: Unexpected assignment ConsumerGroupHeartbeatResponseData(throttleTimeMs=0, errorCode=0, errorMessage=null, memberId='OGfeiEjOQbqUTsJgtGMCdQ', memberEpoch=1, heartbeatIntervalMs=5000, assignment=null) ==> expected: not <null>

I addressed the issue by using TestUtils.tryUntilNoAssertionError() to allow for retries.

Root Cause:
The failure occurs because the test depends on an async operation, refreshRegularExpressions, within GroupMetadataManager , which may not complete running before assertion runs.

  • Also 1 indicator async operation did not finish is the log memberEpoch=1 because when this test successful runs memberEpoch=2 at end. Initially updated from epoch 0 -> 1 here. Once refreshRegularExpressions is done, handleRegularExpressionsResult updates epoch from 1 -> 2 here

refreshRegularExpressions is responsible for resolving regular expression based subscriptions to the current set of matching topic names in the cluster.

Testing

  • Used this command
    for i in {1..100}; do echo "Run #$i"; ./gradlew :core:integrationTest --rerun-tasks --tests kafka.api.AuthorizerIntegrationTest.testConsumerGroupHeartbeatWithRegex; if [ $? -ne 0 ]; then echo "Test failed on run #$i"; exit 1; fi; done; echo "All 100 runs passed successfully."
  • Also added intentional wait to async call and confirmed it passed

Note:
I ran this 100 times locally, but was unable to reproduce the same error. Only way I was able to reproduce was by adding a sleep of a second to async call then got the same exact error.

@github-actions github-actions bot added triage PRs from the community core Kafka Broker tests Test fixes (including flaky tests) small Small PRs labels Aug 4, 2025
@jmmonte2 jmmonte2 changed the title KAFKA-1948: Fix flaky test testConsumerGroupHeartbeatWithRegex KAFKA-19481: Fix flaky test testConsumerGroupHeartbeatWithRegex Aug 4, 2025
@jmmonte2 jmmonte2 marked this pull request as ready for review August 4, 2025 02:01
Copy link
Contributor

@squah-confluent squah-confluent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, since testConsumerGroupHeartbeatWithRegexWithTopicDescribeAclAddedAndRemoved follows the same approach.

Can we update the PR description to explain what the async operation and test failure are? It's not immediately clear unless you're familiar with regex subscriptions.

@github-actions github-actions bot removed the triage PRs from the community label Aug 5, 2025
@jmmonte2
Copy link
Author

jmmonte2 commented Aug 5, 2025

lgtm, since testConsumerGroupHeartbeatWithRegexWithTopicDescribeAclAddedAndRemoved follows the same approach.

Can we update the PR description to explain what the async operation and test failure are? It's not immediately clear unless you're familiar with regex subscriptions.

@squah-confluent thanks for the review. I went ahead and updated.

Copy link
Contributor

@squah-confluent squah-confluent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmmonte2 That's more than enough detail, thank you for updating the description!

@jmmonte2
Copy link
Author

jmmonte2 commented Aug 6, 2025

@squah-confluent Thank you! I noticed the approval did not go through due to lack of write access. Do you have another reviewer I can add on?

@jmmonte2
Copy link
Author

@FrankYang0529 Could you please also review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Kafka Broker small Small PRs tests Test fixes (including flaky tests)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants