Implement Snapshot validation API for commits #14514
Open
+185
−98
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change implements the ability to add custom
Snapshotvalidations to the existingSnapshotUpdateAPI.It focuses on reusing existing validation APIs in
SnapshotProducerand removing code duplication, while providing enough flexibility for clients, such as Kafka Connect and Flink.It is an alternative solution to #14509.
Why?
Custom
Snapshotvalidation is necessary for non-idempotent table update operations, which rely on the existing state for correctness and exactly-once delivery. Applications like Flink and Kafka Connect use snapshot properties to store their idempotence keys, which identify the base state during recovery. Due to the nature of concurrent commits in Iceberg, these applications need the ability to check information of the new base snapshots to identify idempotence violations. This change addresses this problem and allows clients to implement custom idempotence validations.How?
This change achieves the following:
validateWith(Consumer<Snapshot> snapshotValidator)method to theSnapshotUpdateto allow customSnapshotvalidations.validateFromSnapshot(long snapshotId)from child interfaces (OverwriteFiles,ReplacePartitions, andRowDelta, etc.) to the parentSnapshotUpdateand remove duplicate definitions.Snapshotfor validation, which is used by the Kafka Connect PR (Validate concurrent commits in DynamicIcebergSink to prevent commit duplication #14517) and other validations.SnapshotProducer::validate(TableMetadata currentMetadata, Snapshot snapshot), allowing child classes to inherit the validation functionality.This approach results in maximised code reuse and aligns with the existing validation functionality.
Impact?
This change is used as a parent in the following PRs: