Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Add reference snapshot ID/timestamps to AllEntriesTable and AllManifestsTable #9335

Closed

Conversation

hsiang-c
Copy link
Contributor

@hsiang-c hsiang-c commented Dec 19, 2023

Note to reviewers

  • Closes Improve All Metadata Tables with Snapshot Information #8856
  • Instead of returning ManifestFile in BaseAllMetadataTableScan::reachableManifests, we return a Pair<Snapshot, ManifestFile> from all snapshots.
  • REF_SNAPSHOT_ID is used by AllManifestsTable already, so I chose this terminology instead of AS_OF_SNAPSHOT. I am open to other names as well.
  • There are a few unit tests in Spark/Flink where the order of the return values were not guaranteed. Therefore, I sorted the expected/actual values by file_path.
  • In the original PR, Russell commented

These would allow us to be able to analyze the actual history in all_entires and all_manifests

I also used Pair<Snapshot, ManifestFile> for all_* tables other than all_entries and all_manifests but I'm not sure if it is necessary. Please let me know.

Sample output

  • Insert 4 times, delete all of them, then insert one more time.
  • Output is ordered by reference_snapshot_timestamp_millis
+-------------------+---------------------+-----------------------------------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|snapshot_id        |reference_snapshot_id|reference_snapshot_timestamp_millis|status|data_file                                                                                                                                                                                                                                                                                                                   |
+-------------------+---------------------+-----------------------------------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|4341802652067419299|4341802652067419299  |1725425584897                      |1     |{0, file:/var/folders/vh/xgbp56d10b72cjzlng9h8krc0000gn/T/junit12937496981447592132/data/00000-0-9d6311fb-28da-48e0-b9fa-c8dfd32748b1-0-00001.parquet, PARQUET, 0, 1, 614, {1 -> 42, 2 -> 43}, {1 -> 1, 2 -> 1}, {1 -> 0, 2 -> 0}, {}, {1 -> [01 00 00 00], 2 -> [61]}, {1 -> [01 00 00 00], 2 -> [61]}, NULL, [4], NULL, 0}|
|4341802652067419299|971355134943407989   |1725425585080                      |1     |{0, file:/var/folders/vh/xgbp56d10b72cjzlng9h8krc0000gn/T/junit12937496981447592132/data/00000-0-9d6311fb-28da-48e0-b9fa-c8dfd32748b1-0-00001.parquet, PARQUET, 0, 1, 614, {1 -> 42, 2 -> 43}, {1 -> 1, 2 -> 1}, {1 -> 0, 2 -> 0}, {}, {1 -> [01 00 00 00], 2 -> [61]}, {1 -> [01 00 00 00], 2 -> [61]}, NULL, [4], NULL, 0}|
|971355134943407989 |971355134943407989   |1725425585080                      |1     |{0, file:/var/folders/vh/xgbp56d10b72cjzlng9h8krc0000gn/T/junit12937496981447592132/data/00000-1-65b2a2db-fc1d-44ee-85d5-2e73a7826d33-0-00001.parquet, PARQUET, 0, 1, 614, {1 -> 42, 2 -> 43}, {1 -> 1, 2 -> 1}, {1 -> 0, 2 -> 0}, {}, {1 -> [01 00 00 00], 2 -> [62]}, {1 -> [01 00 00 00], 2 -> [62]}, NULL, [4], NULL, 0}|
|971355134943407989 |8947587060401590385  |1725425585198                      |1     |{0, file:/var/folders/vh/xgbp56d10b72cjzlng9h8krc0000gn/T/junit12937496981447592132/data/00000-1-65b2a2db-fc1d-44ee-85d5-2e73a7826d33-0-00001.parquet, PARQUET, 0, 1, 614, {1 -> 42, 2 -> 43}, {1 -> 1, 2 -> 1}, {1 -> 0, 2 -> 0}, {}, {1 -> [01 00 00 00], 2 -> [62]}, {1 -> [01 00 00 00], 2 -> [62]}, NULL, [4], NULL, 0}|
|4341802652067419299|8947587060401590385  |1725425585198                      |1     |{0, file:/var/folders/vh/xgbp56d10b72cjzlng9h8krc0000gn/T/junit12937496981447592132/data/00000-0-9d6311fb-28da-48e0-b9fa-c8dfd32748b1-0-00001.parquet, PARQUET, 0, 1, 614, {1 -> 42, 2 -> 43}, {1 -> 1, 2 -> 1}, {1 -> 0, 2 -> 0}, {}, {1 -> [01 00 00 00], 2 -> [61]}, {1 -> [01 00 00 00], 2 -> [61]}, NULL, [4], NULL, 0}|
|8947587060401590385|8947587060401590385  |1725425585198                      |1     |{0, file:/var/folders/vh/xgbp56d10b72cjzlng9h8krc0000gn/T/junit12937496981447592132/data/00000-2-c6c84ab2-06b9-4d49-8239-460e448c3c35-0-00001.parquet, PARQUET, 0, 1, 614, {1 -> 42, 2 -> 43}, {1 -> 1, 2 -> 1}, {1 -> 0, 2 -> 0}, {}, {1 -> [01 00 00 00], 2 -> [61]}, {1 -> [01 00 00 00], 2 -> [61]}, NULL, [4], NULL, 0}|
|8947587060401590385|2421236859611202602  |1725425585316                      |1     |{0, file:/var/folders/vh/xgbp56d10b72cjzlng9h8krc0000gn/T/junit12937496981447592132/data/00000-2-c6c84ab2-06b9-4d49-8239-460e448c3c35-0-00001.parquet, PARQUET, 0, 1, 614, {1 -> 42, 2 -> 43}, {1 -> 1, 2 -> 1}, {1 -> 0, 2 -> 0}, {}, {1 -> [01 00 00 00], 2 -> [61]}, {1 -> [01 00 00 00], 2 -> [61]}, NULL, [4], NULL, 0}|
|971355134943407989 |2421236859611202602  |1725425585316                      |1     |{0, file:/var/folders/vh/xgbp56d10b72cjzlng9h8krc0000gn/T/junit12937496981447592132/data/00000-1-65b2a2db-fc1d-44ee-85d5-2e73a7826d33-0-00001.parquet, PARQUET, 0, 1, 614, {1 -> 42, 2 -> 43}, {1 -> 1, 2 -> 1}, {1 -> 0, 2 -> 0}, {}, {1 -> [01 00 00 00], 2 -> [62]}, {1 -> [01 00 00 00], 2 -> [62]}, NULL, [4], NULL, 0}|
|2421236859611202602|2421236859611202602  |1725425585316                      |1     |{0, file:/var/folders/vh/xgbp56d10b72cjzlng9h8krc0000gn/T/junit12937496981447592132/data/00000-3-d3748dc0-4708-4a16-820d-81e2214765c1-0-00001.parquet, PARQUET, 0, 1, 614, {1 -> 42, 2 -> 43}, {1 -> 1, 2 -> 1}, {1 -> 0, 2 -> 0}, {}, {1 -> [01 00 00 00], 2 -> [62]}, {1 -> [01 00 00 00], 2 -> [62]}, NULL, [4], NULL, 0}|
|4341802652067419299|2421236859611202602  |1725425585316                      |1     |{0, file:/var/folders/vh/xgbp56d10b72cjzlng9h8krc0000gn/T/junit12937496981447592132/data/00000-0-9d6311fb-28da-48e0-b9fa-c8dfd32748b1-0-00001.parquet, PARQUET, 0, 1, 614, {1 -> 42, 2 -> 43}, {1 -> 1, 2 -> 1}, {1 -> 0, 2 -> 0}, {}, {1 -> [01 00 00 00], 2 -> [61]}, {1 -> [01 00 00 00], 2 -> [61]}, NULL, [4], NULL, 0}|
|4324292129886250863|4324292129886250863  |1725425585433                      |2     |{0, file:/var/folders/vh/xgbp56d10b72cjzlng9h8krc0000gn/T/junit12937496981447592132/data/00000-2-c6c84ab2-06b9-4d49-8239-460e448c3c35-0-00001.parquet, PARQUET, 0, 1, 614, {1 -> 42, 2 -> 43}, {1 -> 1, 2 -> 1}, {1 -> 0, 2 -> 0}, {}, {1 -> [01 00 00 00], 2 -> [61]}, {1 -> [01 00 00 00], 2 -> [61]}, NULL, [4], NULL, 0}|
|4324292129886250863|4324292129886250863  |1725425585433                      |2     |{0, file:/var/folders/vh/xgbp56d10b72cjzlng9h8krc0000gn/T/junit12937496981447592132/data/00000-1-65b2a2db-fc1d-44ee-85d5-2e73a7826d33-0-00001.parquet, PARQUET, 0, 1, 614, {1 -> 42, 2 -> 43}, {1 -> 1, 2 -> 1}, {1 -> 0, 2 -> 0}, {}, {1 -> [01 00 00 00], 2 -> [62]}, {1 -> [01 00 00 00], 2 -> [62]}, NULL, [4], NULL, 0}|
|4324292129886250863|4324292129886250863  |1725425585433                      |2     |{0, file:/var/folders/vh/xgbp56d10b72cjzlng9h8krc0000gn/T/junit12937496981447592132/data/00000-0-9d6311fb-28da-48e0-b9fa-c8dfd32748b1-0-00001.parquet, PARQUET, 0, 1, 614, {1 -> 42, 2 -> 43}, {1 -> 1, 2 -> 1}, {1 -> 0, 2 -> 0}, {}, {1 -> [01 00 00 00], 2 -> [61]}, {1 -> [01 00 00 00], 2 -> [61]}, NULL, [4], NULL, 0}|
|4324292129886250863|4324292129886250863  |1725425585433                      |2     |{0, file:/var/folders/vh/xgbp56d10b72cjzlng9h8krc0000gn/T/junit12937496981447592132/data/00000-3-d3748dc0-4708-4a16-820d-81e2214765c1-0-00001.parquet, PARQUET, 0, 1, 614, {1 -> 42, 2 -> 43}, {1 -> 1, 2 -> 1}, {1 -> 0, 2 -> 0}, {}, {1 -> [01 00 00 00], 2 -> [62]}, {1 -> [01 00 00 00], 2 -> [62]}, NULL, [4], NULL, 0}|
|2788755331636060629|2788755331636060629  |1725425585574                      |1     |{0, file:/var/folders/vh/xgbp56d10b72cjzlng9h8krc0000gn/T/junit12937496981447592132/data/00000-4-9ba23c37-daaf-4e93-a9bb-5fd06b954fb2-0-00001.parquet, PARQUET, 0, 1, 614, {1 -> 42, 2 -> 43}, {1 -> 1, 2 -> 1}, {1 -> 0, 2 -> 0}, {}, {1 -> [01 00 00 00], 2 -> [62]}, {1 -> [01 00 00 00], 2 -> [62]}, NULL, [4], NULL, 0}|
+-------------------+---------------------+-----------------------------------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

@hsiang-c hsiang-c marked this pull request as draft December 19, 2023 00:18
Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work, left some preliminary comments

CloseableIterable<ManifestFile> manifests =
reachableManifests(snapshot -> snapshot.allManifests(table().io()));
return BaseEntriesTable.planFiles(table(), manifests, tableSchema(), schema(), context());
return new ParallelIterable<>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we don't close the ParallelIterable like in the original reachableManifests method.

Also looks like we are going to call planFiles many more times than the original.

My first thought is to keep the logic but use a Pair<ManifestFile, Snapshot> where ManifestFile is used now. How about adapting the correct reachableManifest code into a new more generic method like

T traverse(Function<Snapshot, T> func)

and have reachableManifests use that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we don't close the ParallelIterable like in the original reachableManifests method.

Good catch! Let me fix it.

Also looks like we are going to call planFiles many more times than the original.

Let me think about it, thanks.

@@ -174,6 +175,8 @@ public static MetricsModes.MetricsMode metricsMode(
field.type(), file.upperBounds().get(field.fieldId()))));

public static final String READABLE_METRICS = "readable_metrics";
public static final String REF_SNAPSHOT_ID = "reference_snapshot_id";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this constant is not too suitable in this file, which is for Metrics. I think if we have to have a constant, should be in BaseMetadataTable. Else we can just actually hard code it, it seems simpler.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel we should move this one.

@hsiang-c
Copy link
Contributor Author

Hello @RussellSpitzer,

@szehon-ho and I had a discussion about adopting the reachableManifests method.

If I understand #8856 correctly, once we associate as_of_snapshot to manifest files while querying "all_*" versions of metadata tables (e.g. all_entries), each (as_of_snapshot, manifest_file) entry becomes unique and de-duplication is no longer a must.

Please let us know if that's what you're looking for, thank you.

  protected CloseableIterable<ManifestFile> reachableManifests(
      Function<Snapshot, Iterable<ManifestFile>> toManifests) {
    Iterable<Snapshot> snapshots = table().snapshots();
    Iterable<Iterable<ManifestFile>> manifestIterables =
        Iterables.transform(snapshots, toManifests);

    try (CloseableIterable<ManifestFile> iterable =
        new ParallelIterable<>(manifestIterables, planExecutor())) {
      return CloseableIterable.withNoopClose(Sets.newHashSet(iterable)); // de-dup `ManifestFile`
    } catch (IOException e) {
      throw new UncheckedIOException("Failed to close parallel iterable", e);
    }
  }

@szehon-ho
Copy link
Collaborator

Hi @hsiang-c , we took another look with @RussellSpitzer , it seems the manifests are de-duped on the traversal down, but the entries (referred to by manifests) should not be de-duped. wdyt?

@szehon-ho
Copy link
Collaborator

szehon-ho commented Jan 25, 2024

Clarified with @hsiang-c . Will draw a diagram to illustrate the problem.

Imagine following graph:

Snapshot1 -> Manifest1 -> Entry1
Snapshot2 -> Manifest1 -> Entry1
Snapshot3 -> Manifest1 -> Entry1

Notice all three snapshots point to the same manifest file. So, given the dedup mechanism we have of traversing manifest files, and assuming first-in-first-out, we will only be left with an entry like:

entry as_of_snapshot
Entry1 Snapshot1

This seems fine to me, as the word 'as_of_snapshot' make it seem like we want to know when each entry first came into the picture. It does not seem necessary to change the behavior of the table and list every single snapshot that refer to the entry in this table. If an entry changes state (ie, EXISTING to DELETED), we will see another entry in the table, because it is a completely new entry (in a new manifest). cc @RussellSpitzer @hsiang-c for thoughts.

@hsiang-c hsiang-c force-pushed the add-ref-snapshot-all-entries-manifest branch from bf0f977 to 43e4cb9 Compare August 25, 2024 11:19
@github-actions github-actions bot added the flink label Aug 25, 2024
@hsiang-c hsiang-c force-pushed the add-ref-snapshot-all-entries-manifest branch from 43e4cb9 to e6b7e24 Compare August 25, 2024 17:15
@hsiang-c hsiang-c force-pushed the add-ref-snapshot-all-entries-manifest branch 4 times, most recently from 8e6ce73 to 0b3caef Compare September 3, 2024 11:08
@hsiang-c hsiang-c marked this pull request as ready for review September 3, 2024 13:08
@hsiang-c
Copy link
Contributor Author

hsiang-c commented Sep 3, 2024

@szehon-ho @RussellSpitzer Please take a look for me, thanks.

List<GenericData.Record> expectedFiles =
ListUtils.union(expectedDataFiles, expectedDeleteFiles);
expectedFiles.sort(Comparator.comparing(r -> ((Integer) r.get("content"))));
assertThat(actualFiles).hasSize(3);
expectedFiles.sort(Comparator.comparing(r -> r.get("file_path").toString()));
Copy link
Contributor Author

@hsiang-c hsiang-c Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering by file_path for both actual files and expected files sorts both list deterministically.

List<Row> actualFiles = TestHelpers.selectNonDerived(actualFilesDs).collectAsList();
Schema entriesTableSchema = Spark3Util.loadIcebergTable(spark, tableName + ".entries").schema();
List<ManifestFile> expectedDataManifests = TestHelpers.dataManifests(table);
List<Record> expectedFiles =
expectedEntries(table, FileContent.DATA, entriesTableSchema, expectedDataManifests, null);
expectedFiles.sort(Comparator.comparing(r -> r.get("file_path").toString()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering by file_path for both actual files and expected files sorts both list deterministically.

@@ -174,6 +175,8 @@ public static MetricsModes.MetricsMode metricsMode(
field.type(), file.upperBounds().get(field.fieldId()))));

public static final String READABLE_METRICS = "readable_metrics";
public static final String REF_SNAPSHOT_ID = "reference_snapshot_id";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel we should move this one.

@@ -1136,6 +1136,78 @@ acceptedBreaks:
new: "method org.apache.iceberg.BaseMetastoreOperations.CommitStatus org.apache.iceberg.BaseMetastoreTableOperations::checkCommitStatus(java.lang.String,\
\ org.apache.iceberg.TableMetadata)"
justification: "Removing deprecated code"
- code: "java.method.returnTypeTypeParametersChanged"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I think we have to keep the old API's for one release. Let's just make a new method then. Hopefully we can re-use logic between these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szehon-ho Sounds good, let me make a new method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szehon-ho I added a new snapshotManifestPairs() to the base classes.

@hsiang-c hsiang-c force-pushed the add-ref-snapshot-all-entries-manifest branch from 0814a8b to 8c22f5c Compare October 1, 2024 06:42
@@ -66,5 +68,14 @@ protected TableScan newRefinedScan(Table table, Schema schema, TableScanContext
protected CloseableIterable<ManifestFile> manifests() {
return reachableManifests(snapshot -> snapshot.dataManifests(table().io()));
}

@Override
protected CloseableIterable<Pair<Snapshot, ManifestFile>> snapshotManifestPairs() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we deprecate the other one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just my opinion on name, but to me Pairs is an implementation detail, maybe we can call it manifestsWithSnapshot to explain more logically

@@ -39,6 +39,8 @@ class AllManifestsTableTaskParser {
private static final String MANIFEST_LIST_LOCATION = "manifest-list-Location";
private static final String RESIDUAL = "residual-filter";
private static final String REFERENCE_SNAPSHOT_ID = "reference-snapshot-id";
private static final String REFERENCE_SNAPSHOT_TIMESTAMP_MILLIS =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while its descriptive, its overly longer than other column names, how about combine with @RussellSpitzer original suggestion and 'reference-snapshot-time'

Table table, ManifestFile manifest, Schema projection, Expression filter) {
this(table.schema(), table.io(), table.specs(), manifest, projection, filter);
Table table,
Pair<Snapshot, ManifestFile> snapshotManifestPair,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just add an argument Snapshot, instead of passing pair?

}

@Override
public <T> T get(int pos, Class<T> javaClass) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wont this not work for Flink, the pos are not always at the end? See https://github.com/apache/iceberg/pull/6222/files

https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/MetricsUtil.java#L441

I think we should make a generic struct for this scenario now that we have so many fields. ie, have a position map kind of like: https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/util/StructProjection.java. do you think its possible?

Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 18, 2025
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve All Metadata Tables with Snapshot Information
2 participants