Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce removeSegment flow with _enableDeletedKeysCompactionConsistency #13914

Merged

Conversation

tibrewalpratik17
Copy link
Contributor

@tibrewalpratik17 tibrewalpratik17 commented Aug 29, 2024

When we enable _enableDeletedKeysCompactionConsistency, we want the removeSegment process to run to decrease the distinctSegmentCount by 1. Previously, in issue #13347, it was assumed that during the replaceSegment operation (whether during commit or refresh segment flow), the addSegment method would increase the count by 1, and removeSegment would decrease it by 1. However, it’s not guaranteed that the removeSegment process will be triggered during the replaceSegment operation.

Here are the steps currently followed during replaceSegment in the ConcurrentMapPartitionUpsertMetadataManager class:

  • Add the new segment.
  • Check if any keys remain unreplaced in the old segment. If any are found, delete those keys from the hashmap using removeSegment(oldSegment, remainingValidDocIDsSnapshotNotReplaced) [Ref].
  • Remove oldSegment from _trackedSegments [Ref].

During the destroy process of SegmentDataManager, the removeSegment(segment) method is called. However, this becomes a no-op since the segment has already been removed from _trackedSegments in step 3 [Ref].

For _enableDeletedKeysCompactionConsistency, it’s crucial to ensure that removeSegment(segment) is called to maintain consistency in the distinctSegmentCount value. This change overrides the replaceSegment method in the ConcurrentMapPartitionUpsertMetadataManagerForConsistentDeletes class, ensuring that the removeSegment(segment) process is always executed at the end.

Additionally, I updated the UTs to track the value of distinctSegmentCount, which was missed in #13347.

After deploying this change, I successfully tested that the deletedttlkeysinmultiplesegments metric decreased significantly, as shown in the screenshot below, indicating that the keys are now being deleted properly and consistently. Previously, the distinctSegmentCount was only increasing by 1 without decreasing, leading to the keys not being deleted as expected.

Screenshot 2024-08-30 at 12 29 16 AM

@codecov-commenter
Copy link

codecov-commenter commented Aug 29, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 11 lines in your changes missing coverage. Please review.

Project coverage is 57.88%. Comparing base (59551e4) to head (7de6698).
Report is 1041 commits behind head on master.

Files with missing lines Patch % Lines
...tionUpsertMetadataManagerForConsistentDeletes.java 50.00% 5 Missing and 6 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13914      +/-   ##
============================================
- Coverage     61.75%   57.88%   -3.87%     
- Complexity      207      219      +12     
============================================
  Files          2436     2617     +181     
  Lines        133233   143455   +10222     
  Branches      20636    22026    +1390     
============================================
+ Hits          82274    83044     +770     
- Misses        44911    53908    +8997     
- Partials       6048     6503     +455     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 57.85% <50.00%> (-3.86%) ⬇️
java-21 57.78% <50.00%> (-3.85%) ⬇️
skip-bytebuffers-false 57.88% <50.00%> (-3.87%) ⬇️
skip-bytebuffers-true 57.74% <50.00%> (+30.01%) ⬆️
temurin 57.88% <50.00%> (-3.87%) ⬇️
unittests 57.88% <50.00%> (-3.87%) ⬇️
unittests1 40.72% <0.00%> (-6.17%) ⬇️
unittests2 27.97% <50.00%> (+0.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tibrewalpratik17 tibrewalpratik17 marked this pull request as ready for review August 29, 2024 19:17
@tibrewalpratik17 tibrewalpratik17 force-pushed the fix_upsert_count_problem branch from 25b2d39 to b2e3a63 Compare August 30, 2024 08:29
// for full upsert, we are de-duping primary key once here to make sure that we are not adding
// primary-key multiple times and subtracting just once in removeSegment.
// for partial-upsert, we call this method in base class.
recordInfoIterator = resolveComparisonTies(recordInfoIterator, _hashFunction);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 make sense

@tibrewalpratik17 tibrewalpratik17 merged commit c565a83 into apache:master Sep 17, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants