Skip to content

Conversation

@aiborodin
Copy link
Contributor

@aiborodin aiborodin commented Nov 6, 2025

This change implements the ability to add custom Snapshot validations to the existing SnapshotUpdate API.

It focuses on reusing existing validation APIs in SnapshotProducer and removing code duplication, while providing enough flexibility for clients, such as Kafka Connect and Flink.

It is an alternative solution to #14509.

Why?

Custom Snapshot validation is necessary for non-idempotent table update operations, which rely on the existing state for correctness and exactly-once delivery. Applications like Flink and Kafka Connect use snapshot properties to store their idempotence keys, which identify the base state during recovery. Due to the nature of concurrent commits in Iceberg, these applications need the ability to check information of the new base snapshots to identify idempotence violations. This change addresses this problem and allows clients to implement custom idempotence validations.

How?

This change achieves the following:

  1. Add the new validateWith(Consumer<Snapshot> snapshotValidator) method to the SnapshotUpdate to allow custom Snapshot validations.
  2. Move the existing duplicate definitions of validateFromSnapshot(long snapshotId) from child interfaces (OverwriteFiles, ReplacePartitions, and RowDelta, etc.) to the parent SnapshotUpdate and remove duplicate definitions.
  1. Implement snapshot validation in the base class SnapshotProducer::validate(TableMetadata currentMetadata, Snapshot snapshot), allowing child classes to inherit the validation functionality.

This approach results in maximised code reuse and aligns with the existing validation functionality.

Impact?

This change is used as a parent in the following PRs:

  1. Implement commit validations for Kafka connect: Validate parent snapshots in Kafka Connect #14515
  2. Fix commit duplication issue in Flink: Validate concurrent commits in DynamicIcebergSink to prevent commit duplication #14517

Change-Id: Ie7e2bc0920bd855188c5bcc2435ace947db9af21
@aiborodin
Copy link
Contributor Author

@rdblue What do you think?

@rdblue
Copy link
Contributor

rdblue commented Nov 7, 2025

I just thought of another issue with this approach. This use of startingSnapshotId conflicts with other uses. Here, it determines the starting point of validation, which is where the table state was when the transaction started (because KC determined what to commit and the current offset from the current snapshot). In other places, it is used for the table version that was actually read -- which may not be current when the operation starts. I think that difference could lead to needing two separate values when combining history validation with other validations that use startingSnapshotId.

@aiborodin
Copy link
Contributor Author

aiborodin commented Nov 7, 2025

I think that difference could lead to needing two separate values when combining history validation with other validations that use startingSnapshotId.

Wouldn't both the history validation and other validations want to check the same set of parent snapshots?
I can't think of any use case where someone would want to validate different ranges of new base snapshots, and it feels like in both cases we should have the same API entry point for setting the starting snapshot.
What do you think?

@rdblue
Copy link
Contributor

rdblue commented Nov 7, 2025

I think that the starting snapshot ID is often based on what an operation read. This validation is likely going to be based on the current version when the transaction started. Since those are different, I would not reuse starting snapshot ID.

@aiborodin
Copy link
Contributor Author

This validation is likely going to be based on the current version when the transaction started.

But would it be reasonable for the operation that has read a different version (not the current one) to run the custom validation from that read version? Why would it want to run the custom validation from the current instead of the read version if it based all its update assumptions on that read version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants