Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KYUUBI #6024] Insert crc checksum observer after all project nodes #6025

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wForget
Copy link
Member

@wForget wForget commented Jan 29, 2024

🔍 Description

Issue References 🔗

This pull request fixes #6024

Describe Your Solution 🔧

Insert crc checksum observer after all Project nodes to compare data at all stage of SQL.

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Behavior Without This Pull Request ⚰️

Behavior With This Pull Request 🎉

Related Unit Tests

org.apache.spark.sql.observe.InsertChecksumObserverAfterProjectSuite


Checklist 📝

Be nice. Be informative.

@wForget wForget self-assigned this Jan 29, 2024
@wForget wForget changed the title [KYUUBI #6024] Insert crc checksum observer after all project nodes. [KYUUBI #6024] Insert crc checksum observer after all project nodes Jan 29, 2024
@wForget wForget requested a review from ulysses-you January 29, 2024 06:24
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (f531a37) 61.07% compared to head (8506283) 61.08%.
Report is 2 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #6025      +/-   ##
============================================
+ Coverage     61.07%   61.08%   +0.01%     
  Complexity       23       23              
============================================
  Files           623      623              
  Lines         37090    37103      +13     
  Branches       5028     5029       +1     
============================================
+ Hits          22653    22666      +13     
- Misses        11986    11992       +6     
+ Partials       2451     2445       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

override def apply(plan: LogicalPlan): LogicalPlan = {
if (conf.getConf(INSERT_CHECKSUM_OBSERVER_AFTER_PROJECT_ENABLED)) {
plan resolveOperatorsUp {
case p: Project if p.resolved && p.getTagValue(INSERT_COLLECT_METRICS_TAG).isEmpty =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the plan is for writing ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the plan is for writing ?

Just add observer after all project nodes, I think there will also be project node before writing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we only add collect metrics for project ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we only add collect metrics for project ?

Most data inconsistencies are caused by project, such as using udf/udaf or type conversion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we remove AfterProject from the rule name and support more nodes in the future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Inject crc checksum observers at all stages of SQL
3 participants