Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for bounded trie metric in legacy worker #33474

Merged
merged 5 commits into from
Feb 6, 2025

Conversation

rohitsinha54
Copy link
Contributor

@rohitsinha54 rohitsinha54 commented Jan 2, 2025

  • Adds support for bounded trie in Dataflow legacy workers.
  • Streaming reports delta counters. This is essential because without it streaming will end up remaining same counters as cumulative through it's entire lifespan. This will be bad even with bounded trie which is self-limiting in size because in backend the level db monitoring writer keeps taking snapshot of counters on publish and writing to level db and for a long running job even with self-limiting data store the file sizes will keep growing.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@rohitsinha54 rohitsinha54 changed the title Btrie legacy worker Add support for bounded trie metric in legacy worker Jan 2, 2025
Copy link
Contributor

github-actions bot commented Jan 2, 2025

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

Copy link
Contributor

github-actions bot commented Feb 1, 2025

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @johnjcasey added as fallback since no labels match configuration

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@rohitsinha54
Copy link
Contributor Author

R: @robertwb

Copy link
Contributor

github-actions bot commented Feb 1, 2025

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

@rohitsinha54
Copy link
Contributor Author

@robertwb Please take a look. Thanks.

MetricKey key, boolean isCumulative, BoundedTrieData boundedTrieData) {
// BoundedTrie uses SET kind metric aggregation which tracks unique strings as a trie.
CounterStructuredNameAndMetadata name = structuredNameAndMetadata(key, Kind.SET);
BoundedTrie counterUpdateTrie = getBoundedTrie(boundedTrieData.toProto());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohitsinha54
Copy link
Contributor Author

The two failure seem unrelated to this change.

Copy link
Contributor

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good. Just one small question.

@robertwb
Copy link
Contributor

robertwb commented Feb 5, 2025

I'm still seeing

2025-02-05T18:44:03.8370351Z > Task :runners:google-cloud-dataflow-java:worker:test
2025-02-05T18:44:03.8371552Z 
2025-02-05T18:44:03.8373642Z org.apache.beam.runners.dataflow.worker.BatchModeExecutionContextTest > extractMetricUpdatesBoundedTrie FAILED

in the failed test run.

@rohitsinha54
Copy link
Contributor Author

Sorry. Fixed.

Copy link

codecov bot commented Feb 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 59.09%. Comparing base (fc43c12) to head (e6c55be).
Report is 24 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master   #33474   +/-   ##
=========================================
  Coverage     59.08%   59.09%           
- Complexity     3238     3240    +2     
=========================================
  Files          1156     1156           
  Lines        176924   176924           
  Branches       3391     3391           
=========================================
+ Hits         104543   104553   +10     
+ Misses        69015    69008    -7     
+ Partials       3366     3363    -3     
Flag Coverage Δ
java 70.36% <ø> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rohitsinha54
Copy link
Contributor Author

Failing test testOnNewWorkerMetadata_redistributesBudget
image

is not related to change. I also ran locally and it passed.

image

@rohitsinha54
Copy link
Contributor Author

@robertwb can you please help merging this? Thanks.

@robertwb
Copy link
Contributor

robertwb commented Feb 6, 2025

Yes. I was trying to get the tests to pass, but this does look unrelated (and possibly flaky). I'll go ahead and merge.

@robertwb robertwb merged commit 222ad95 into apache:master Feb 6, 2025
28 of 29 checks passed
@rohitsinha54
Copy link
Contributor Author

Thank you.

Created a CP to 2.63: #33890

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants