Skip to content

[Feature] [core] Iceberg: createMetadataWithoutBase should include full history of Paimon snapshots #6107

@nickdelnano

Description

@nickdelnano

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Code path: https://github.com/apache/paimon/blob/ff02f6bf3ceccf8dcac38bc58cf6db390509bd46/paimon-core/src/main/java/org/apache/paimon/iceberg/IcebergCommitCallback.java#L305C49-L305C59

createMetadataWithoutBase is called under certain conditions like

  • enabling Iceberg compatibility for the first time
  • committing Iceberg metadata after a previous commit failed in the Iceberg layer (for many possible reasons)

In some cases the Paimon table will have many snapshots already (e.g. snapshots 1 to 1000) and after calling createMetadataWithoutBase, only the latest Paimon commit will be synced to Iceberg.

By syncing the whole Paimon history to Iceberg, the Iceberg compatibility feature becomes suitable for production use cases that require Iceberg time travel.

My use case is this:

  • Sync MySQL tables to Paimon using Flink CDC
  • Tag daily Paimon snapshots automatically
  • Iceberg readers read daily snapshots

When a failure happens in the Iceberg committer the metadata needs to be recreated on the next commit. Only the latest Paimon snapshot is included. The daily snapshot component of this pipeline has broken and cannot be recovered.

Solution

A simple solution may exist. Assume snapshot range [x, y]:

If creating metadata without base:

  • createMetadataWithoutBase for the earliest snapshot, x
  • (for i=x, i <= y, i++) call createMetadataWithBase(i)

Anything else?

Consider making this feature opt-in with configuration as it may be a costly operation to sync many Paimon snapshots to Iceberg and therefore reach the Flink checkpoint timeout.

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions