-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce removeSegment flow with _enableDeletedKeysCompactionConsistency #13914
Enforce removeSegment flow with _enableDeletedKeysCompactionConsistency #13914
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13914 +/- ##
============================================
- Coverage 61.75% 57.88% -3.87%
- Complexity 207 219 +12
============================================
Files 2436 2617 +181
Lines 133233 143455 +10222
Branches 20636 22026 +1390
============================================
+ Hits 82274 83044 +770
- Misses 44911 53908 +8997
- Partials 6048 6503 +455
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
25b2d39
to
b2e3a63
Compare
.../src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java
Outdated
Show resolved
Hide resolved
// for full upsert, we are de-duping primary key once here to make sure that we are not adding | ||
// primary-key multiple times and subtracting just once in removeSegment. | ||
// for partial-upsert, we call this method in base class. | ||
recordInfoIterator = resolveComparisonTies(recordInfoIterator, _hashFunction); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 make sense
When we enable
_enableDeletedKeysCompactionConsistency
, we want theremoveSegment
process to run to decrease thedistinctSegmentCount
by 1. Previously, in issue #13347, it was assumed that during thereplaceSegment
operation (whether during commit or refresh segment flow), theaddSegment
method would increase the count by 1, andremoveSegment
would decrease it by 1. However, it’s not guaranteed that theremoveSegment
process will be triggered during thereplaceSegment
operation.Here are the steps currently followed during replaceSegment in the
ConcurrentMapPartitionUpsertMetadataManager
class:removeSegment(oldSegment, remainingValidDocIDsSnapshotNotReplaced)
[Ref].During the destroy process of SegmentDataManager, the removeSegment(segment) method is called. However, this becomes a no-op since the segment has already been removed from _trackedSegments in step 3 [Ref].
For
_enableDeletedKeysCompactionConsistency
, it’s crucial to ensure thatremoveSegment(segment)
is called to maintain consistency in the distinctSegmentCount value. This change overrides the replaceSegment method in theConcurrentMapPartitionUpsertMetadataManagerForConsistentDeletes
class, ensuring that the removeSegment(segment) process is always executed at the end.Additionally, I updated the UTs to track the value of distinctSegmentCount, which was missed in #13347.
After deploying this change, I successfully tested that the
deletedttlkeysinmultiplesegments
metric decreased significantly, as shown in the screenshot below, indicating that the keys are now being deleted properly and consistently. Previously, the distinctSegmentCount was only increasing by 1 without decreasing, leading to the keys not being deleted as expected.