Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel][Metrics][PR#3] Metrics report JSON serializer and LoggingMetricsReporter for the default engine #3904

Merged
merged 5 commits into from
Jan 9, 2025

Conversation

allisonport-db
Copy link
Collaborator

@allisonport-db allisonport-db commented Nov 26, 2024

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This PR is based off of #3903
See the diff for just this PR here

Adds a JSON serializer for metrics reports with serialization logic for SnapshotReport. Also adds a LoggingMetricsReporter to the default implementation which simply logs the JSON serialized reports using Log4J.

How was this patch tested?

Adds a test suite.

Does this PR introduce any user-facing changes?

No.

@allisonport-db allisonport-db force-pushed the metrics-3 branch 2 times, most recently from 6966594 to 632985b Compare November 27, 2024 01:39
@allisonport-db allisonport-db changed the title Metrics 3 [Kernel][Metrics][PR#3] Metrics report JSON serializer and LoggingMetricsReporter for the default engine Nov 27, 2024
Copy link
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 2 minor comments

Copy link
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with two minor comments!

*
* @throws JsonProcessingException
*/
public static String serializeSnapshotReport(SnapshotReport snapshotReport)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we would have to do this for every type of report, right?

Any way to avoid that? (Since they would all just call OBJECT_MAPPER.writeValueAsString(inputVariable)

Could we just take in a DeltaOperationReport?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I wasn't sure about this because I also think there's a use-case where you are only serializing a specific report type. maybe we can have both serializeSnapshotReport and serializeDeltaOperationReport

But then I need to make DeltaOperationReport serializable (which currently I did not). What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every report we create will implement DeltaOperationReport, right?

And every report we create we will by default log in with our default engine, right?

Then it seems fair that DeltaOperationReport be serializable 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I have some thoughts that I haven't fully fleshed out.

Every operation-type report we create will implement DeltaOperationReport, possibly not every report. But yes we probably plan to log all report types in our default engine (I can't think of a reason not to currently). But that would argue that we should make MetricsReport serializable....

If we add a new report type i.e. XXReport extends DeltaOperationReport but don't make it serializable, it will still be serialized but will be missing additional information in XXReport.

Do we ever expect an report that only extends DeltaOperationReport? Maybe we just want to report that an operation occurred or success vs failure?

Need to verify that if both DeltaOperationReport and SnapshotReport are serializable jackson will use the lowest ancestor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is something we can flesh out later? These are all internal APIs and for now, I think, making sure the main report types are serializable is enough.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is something we can flesh out later?

Yup! SGTM!

@allisonport-db allisonport-db merged commit fb9bb13 into delta-io:master Jan 9, 2025
16 of 19 checks passed
huan233usc pushed a commit to huan233usc/delta that referenced this pull request Jan 17, 2025
…ricsReporter for the default engine (delta-io#3904)

<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [X] Kernel
- [ ] Other (fill in here)

## Description

This PR is based off of delta-io#3903
See the diff for just this PR
[here](https://github.com/delta-io/delta/pull/3904/files/aec95cf3dc0086c37f4c45e2b3e192b7b881768c..678ac473f4de65a8f7fd770696aad2d31a15aef7)

Adds a JSON serializer for metrics reports with serialization logic for
SnapshotReport. Also adds a `LoggingMetricsReporter` to the default
implementation which simply logs the JSON serialized reports using
Log4J.

## How was this patch tested?

Adds a test suite.

## Does this PR introduce _any_ user-facing changes?

No.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants