[Kernel][Metrics][PR#2] Add SnapshotReport for reporting snapshot construction #3903

allisonport-db · 2024-11-26T19:43:17Z

Which Delta project/connector is this regarding?

Description

Adds a SnapshotReport for reporting snapshot construction.

We record a SnapshotReport after successfully constructing a snapshot or if an exception is thrown during construction. We use SnapshotContext to propagate and update information about the snapshot construction. For example, we update the "version" as soon as it's resolved for time-travel by timestamp or load latest snapshot queries. This means in the case of the exception we include as much information as is available.

How was this patch tested?

Adds a test suite MetricsReportSuite with unit tests.

Does this PR introduce any user-facing changes?

No.

scottsand-db

Looks great! Reviewed the production code, not the tests yet

kernel/kernel-api/src/main/java/io/delta/kernel/internal/metrics/SnapshotContext.java

scottsand-db · 2024-12-04T18:42:26Z

kernel/kernel-api/src/main/java/io/delta/kernel/internal/TransactionBuilderImpl.java

@@ -105,8 +107,11 @@ public Transaction build(Engine engine) {
      // Table doesn't exist yet. Create an initial snapshot with the new schema.
      Metadata metadata = getInitialMetadata();
      Protocol protocol = getInitialProtocol();
-      LogReplay logReplay = getEmptyLogReplay(engine, metadata, protocol);
-      snapshot = new InitialSnapshot(table.getDataPath(), logReplay, metadata, protocol);
+      SnapshotContext snapshotContext = SnapshotContext.forVersionSnapshot(tablePath, 0);


What are the semantics of the SnapshotContext version? i.e. is it the read version? the attempted write version? Is there a differnce or does it matter?

For example, with this code that uses forVersionSnapshot(..., 0) ... we get a snapshotContext that is identical to the case where a table has 3 commits (0, 1, 2) and you time travel to read the table at version 0.

Curious what you think about this?

I updated this, my mistake this should've been -1 (which matches the version returned by the log segment & log replay).

In general it is the intended read version. For this specific case, we treat a non-existent table as read version -1. Note, we don't generate a snapshot report for this case since there is no real snapshot to load!

kernel/kernel-api/src/main/java/io/delta/kernel/internal/metrics/SnapshotReportImpl.java

kernel/kernel-api/src/main/java/io/delta/kernel/internal/snapshot/SnapshotManager.java

scottsand-db · 2024-12-04T18:49:06Z

kernel/kernel-api/src/main/java/io/delta/kernel/internal/snapshot/SnapshotManager.java

+            snapshotContext);
+
+    // Push snapshot report to engine
+    engine.getMetricsReporters().forEach(reporter -> reporter.report(snapshot.getSnapshotReport()));


Question: in general, when should we be pushing to the engine?

I think as soon as the given operation is completed. In this case, as soon as a snapshot is successfully constructed. I viewed this to essentially mean as soon as the metadata/protocol is loaded.

kernel/kernel-api/src/main/java/io/delta/kernel/metrics/SnapshotMetricsResult.java

scottsand-db

Just posting what we discussed in a sync! Looks great!

Would be great to use factory methods to create SnapshotReportImpls. One for success/normal cases, one for error case. This should take in the SnapshotQueryContext.
Ensure you have the same exact method / value names between SnapshotMetrics and SnapshotMetricsResult. Also, add docs to both that explain the relationship. i.e. SnapshotMetrics creates SnapshotMetricsResult; SnapshotMetricsResult is created by SnapshotMetrics

scottsand-db · 2024-12-05T23:18:41Z

kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/MetricsReportSuite.scala

+        path,
+        expectException = false,
+        expectedVersion = Optional.of(0),
+        expectedProvidedTimestamp = Optional.empty(), // No time travel


Didn't we time travel to version 0? Would // No time travel by timestamp be a more accurate comment? And I think this would apply to all the other such instances of this comment, right?

scottsand-db · 2024-12-05T23:21:06Z

kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/MetricsReportSuite.scala

+    }
+  }
+
+  test("SnapshotReport valid queries") {


Could you also test: try and time travel by a bad version (e.g. version 55) or by a bad timestamp (e.g. the year 1985) but on a valid table?

e.g. you could rename this test to SnapshotReport queries on valid table

So originally I did not add these tests because my main goal with the exception cases was to cover the 3 possible places snapshot construction could fail (outlined here in design doc) instead of trying to cover all possible errors/exceptions during snapshot construction.

Error resolving timestamp --> version (only applicable for time-travel by timestamp)

Error building the log segment

Error constructing the snapshot (loading table protocol and metadata)

But I'm happy to add these two cases anyways. I can just add an additional test for them.

So originally I did not add these tests because my main goal with the exception cases was to cover the 3 possible places snapshot construction could fail (outlined here in design doc) instead of trying to cover all possible errors/exceptions during snapshot construction.

Sorry, what's unclear to me is if you didn't plan to include those tests in this PR or if there's no plan to write them in the near future

scottsand-db · 2024-12-05T23:23:42Z

kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/MetricsReportSuite.scala

+class MetricsReportSuite extends AnyFunSuite with TestUtils {
+
+  ///////////////////////////
+  // SnapshotReport tests //


How many different types of reports do you anticipate we will have? If test + helper code for just SnapshotReport is ~300 LOC, I wonder if we should split this up into different files?

e.g. SnapshotMetricsReportSuite extends MetricsReportTestUtils

allisonport-db force-pushed the metrics-2 branch 2 times, most recently from e1dbe30 to ceb5a56 Compare November 27, 2024 00:36

allisonport-db changed the title ~~Metrics 2~~ [Kernel][Metrics][PR#2] Add SnapshotReport for reporting snapshot construction Nov 27, 2024

allisonport-db mentioned this pull request Nov 27, 2024

[Kernel] Support metrics and event logging #3905

Open

5 tasks

allisonport-db requested review from scottsand-db and vkorukanti November 27, 2024 01:16

allisonport-db mentioned this pull request Nov 27, 2024

[Kernel][Metrics][PR#3] Metrics report JSON serializer and LoggingMetricsReporter for the default engine #3904

Open

5 tasks

scottsand-db requested changes Dec 4, 2024

View reviewed changes

Snapshot report and tests

e1d8846

allisonport-db force-pushed the metrics-2 branch from ceb5a56 to db5c16c Compare December 5, 2024 21:52

Respond to comments

8e1f651

allisonport-db force-pushed the metrics-2 branch from db5c16c to 8e1f651 Compare December 5, 2024 21:56

allisonport-db requested a review from scottsand-db December 5, 2024 22:09

scottsand-db requested changes Dec 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel][Metrics][PR#2] Add SnapshotReport for reporting snapshot construction #3903

[Kernel][Metrics][PR#2] Add SnapshotReport for reporting snapshot construction #3903

allisonport-db commented Nov 26, 2024 •

edited

Loading

scottsand-db left a comment

scottsand-db Dec 4, 2024

allisonport-db Dec 5, 2024

scottsand-db Dec 4, 2024

allisonport-db Dec 5, 2024

scottsand-db left a comment

scottsand-db Dec 5, 2024

scottsand-db Dec 5, 2024

allisonport-db Dec 6, 2024

scottsand-db Dec 6, 2024

scottsand-db Dec 5, 2024

[Kernel][Metrics][PR#2] Add SnapshotReport for reporting snapshot construction #3903

Are you sure you want to change the base?

[Kernel][Metrics][PR#2] Add SnapshotReport for reporting snapshot construction #3903

Conversation

allisonport-db commented Nov 26, 2024 • edited Loading

Which Delta project/connector is this regarding?

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

scottsand-db left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottsand-db left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allisonport-db commented Nov 26, 2024 •

edited

Loading