Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel][Metrics][PR#2] Add SnapshotReport for reporting snapshot construction #3903

Merged
merged 12 commits into from
Jan 8, 2025

Conversation

allisonport-db
Copy link
Collaborator

@allisonport-db allisonport-db commented Nov 26, 2024

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Adds a SnapshotReport for reporting snapshot construction.

We record a SnapshotReport after successfully constructing a snapshot or if an exception is thrown during construction. We use SnapshotContext to propagate and update information about the snapshot construction. For example, we update the "version" as soon as it's resolved for time-travel by timestamp or load latest snapshot queries. This means in the case of the exception we include as much information as is available.

How was this patch tested?

Adds a test suite MetricsReportSuite with unit tests.

Does this PR introduce any user-facing changes?

No.

@allisonport-db allisonport-db force-pushed the metrics-2 branch 2 times, most recently from e1dbe30 to ceb5a56 Compare November 27, 2024 00:36
@allisonport-db allisonport-db changed the title Metrics 2 [Kernel][Metrics][PR#2] Add SnapshotReport for reporting snapshot construction Nov 27, 2024
Copy link
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Reviewed the production code, not the tests yet

Copy link
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just posting what we discussed in a sync! Looks great!

  1. Would be great to use factory methods to create SnapshotReportImpls. One for success/normal cases, one for error case. This should take in the SnapshotQueryContext.

  2. Ensure you have the same exact method / value names between SnapshotMetrics and SnapshotMetricsResult. Also, add docs to both that explain the relationship. i.e. SnapshotMetrics creates SnapshotMetricsResult; SnapshotMetricsResult is created by SnapshotMetrics

class MetricsReportSuite extends AnyFunSuite with TestUtils {

///////////////////////////
// SnapshotReport tests //
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many different types of reports do you anticipate we will have? If test + helper code for just SnapshotReport is ~300 LOC, I wonder if we should split this up into different files?

e.g. SnapshotMetricsReportSuite extends MetricsReportTestUtils

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now this is okay. I think the other report types will have fewer tests (much fewer error cases). If it ends up being a lot in a later PR I can separate the suites then.

Copy link
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left some comments and questions. Thanks!

import io.delta.kernel.metrics.SnapshotReport;
import java.util.Optional;

/** Stores the context for a given Snapshot query. Used to generate a {@link SnapshotReport} */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is named SnapshotQueryContext ... and you say that it "stores the context" ... but can you define a bit what "context" means? You don't need to list all of the information it contains .. but at high level, what does context mean?

Stores the context for a given Snapshot query. Used to generate a {@link SnapshotReport}.

Here, "context" means the state and information pertaining to when a Snapshot was constructed, for which table, and its associated metrics

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also talk a bit about the expected lifecycle of this class. What is it coupled to?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the class docs

Copy link
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Minor comments mainly on variable/api naming. After this LGTM


@Override
default String operationType() {
return DeltaOperationReport.OperationType.SNAPSHOT.toString();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. If we return String still I wonder if the enum isn't as useful? what if we returned the enum value instead?

i.e. how do we expect this operationType() method to be used?

I'm fine with whatever you decide, no need to block.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought is that it's useful to go to DeltaOperationType.OperationType and see all the possible operation types we might encounter. I think it makes sense to be string instead of the enum to try to keep the metric return types to standard built in types.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe the answer to that is we don't need the enum and it's possible to just look at all the metrics interfaces we have added.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought is that it's useful to go to DeltaOperationReport.OperationType and see all the possible operation types we might encounter.

But there's nothing enforcing this, right? It's just declaring an enum -- children of DeltaOperationReport don't actually need to add a new type to this enum, they could just return an arbitrary string for String operationType().

If we had operationType return an enum, then it's fair to require and expect that children will add new values for the enum.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe the answer to that is we don't need the enum and it's possible to just look at all the metrics interfaces we have added.

I think the question is: how will the result of operationType be used? is it just for printing? do we want to check for a particular value (in this case, an enum would be nice)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I think we would probably wouldn't check for a specific value. If you want a snapshot report you would check if it's an instance of SnapshotReport.

I think i'll remove the enum and keep operationType to return a string.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM!

@allisonport-db allisonport-db merged commit 40ba346 into delta-io:master Jan 8, 2025
19 checks passed
allisonport-db added a commit that referenced this pull request Jan 9, 2025
…ricsReporter for the default engine (#3904)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [X] Kernel
- [ ] Other (fill in here)

## Description

This PR is based off of #3903
See the diff for just this PR
[here](https://github.com/delta-io/delta/pull/3904/files/aec95cf3dc0086c37f4c45e2b3e192b7b881768c..678ac473f4de65a8f7fd770696aad2d31a15aef7)

Adds a JSON serializer for metrics reports with serialization logic for
SnapshotReport. Also adds a `LoggingMetricsReporter` to the default
implementation which simply logs the JSON serialized reports using
Log4J.

## How was this patch tested?

Adds a test suite.

## Does this PR introduce _any_ user-facing changes?

No.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants