-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Search before asking
- I searched in the issues and found nothing similar.
Motivation
createMetadataWithoutBase is called under certain conditions like
- enabling Iceberg compatibility for the first time
- committing Iceberg metadata after a previous commit failed in the Iceberg layer (for many possible reasons)
In some cases the Paimon table will have many snapshots already (e.g. snapshots 1 to 1000) and after calling createMetadataWithoutBase, only the latest Paimon commit will be synced to Iceberg.
By syncing the whole Paimon history to Iceberg, the Iceberg compatibility feature becomes suitable for production use cases that require Iceberg time travel.
My use case is this:
- Sync MySQL tables to Paimon using Flink CDC
- Tag daily Paimon snapshots automatically
- Iceberg readers read daily snapshots
When a failure happens in the Iceberg committer the metadata needs to be recreated on the next commit. Only the latest Paimon snapshot is included. The daily snapshot component of this pipeline has broken and cannot be recovered.
Solution
A simple solution may exist. Assume snapshot range [x, y]:
If creating metadata without base:
- createMetadataWithoutBase for the earliest snapshot, x
- (for i=x, i <= y, i++) call createMetadataWithBase(i)
Anything else?
Consider making this feature opt-in with configuration as it may be a costly operation to sync many Paimon snapshots to Iceberg and therefore reach the Flink checkpoint timeout.
Are you willing to submit a PR?
- I'm willing to submit a PR!