Skip to content

fix: Fix Spark SQL AQE exchange reuse test failures #1811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 2, 2025

Conversation

coderfender
Copy link
Contributor

@coderfender coderfender commented May 29, 2025

Which issue does this PR close?

Closes #1737

Rationale for this change

The AQE feature in spark relies on a cache (where key is canonicalized version of a plan and value is the actual SparkPlan object itself) to avoid recomputation of stages and improve stage efficiency. However, with comet and spark plans' canonicalized version being the same, the AQE incorrectly fetches Comet Plan instead of native Spark ones causing Class type exceptions as mentioned in the related github issue. The goal of this change is to prevent that by handling AQE use case between the final handoff between Comet and Spark systems. Although this change solves the AQE usecase, the goal of the PR is to come up with a better suited strategy which includes but not limited to changing the canonicalization logic of Comet plans so that Spark can differentiate them

What changes are included in this PR?

Changes to return the right object in EliminateRedundantTransitions.scala and also make sure that doExecute method inovkes Comet's method directly.

How are these changes tested?

Local testing - Running all AQE unit tests
Integration testing. - Workflow to run Spark , OS level and TPCH tests

@codecov-commenter
Copy link

codecov-commenter commented May 29, 2025

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Project coverage is 59.40%. Comparing base (f09f8af) to head (63e93ef).
Report is 234 commits behind head on main.

Files with missing lines Patch % Lines
...he/comet/rules/EliminateRedundantTransitions.scala 0.00% 1 Missing and 2 partials ⚠️
...t/execution/shuffle/CometShuffleExchangeExec.scala 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1811      +/-   ##
============================================
+ Coverage     56.12%   59.40%   +3.28%     
- Complexity      976     1151     +175     
============================================
  Files           119      129      +10     
  Lines         11743    12631     +888     
  Branches       2251     2368     +117     
============================================
+ Hits           6591     7504     +913     
+ Misses         4012     3919      -93     
- Partials       1140     1208      +68     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove changed the title Spark : Fix AQE Tests fix: Fix Spark SQL AQE Tests May 29, 2025
@coderfender coderfender marked this pull request as ready for review May 29, 2025 22:26
@andygrove
Copy link
Member

Can we enable some of the Spark SQL tests as part of this PR by updating the diff (at least for Spark 3.5.5)?

@coderfender coderfender force-pushed the fix_breaking_tests_spark branch from a3e6a35 to 0a4be24 Compare June 2, 2025 01:14
@coderfender
Copy link
Contributor Author

Updated diff file to unignore AQE tests in Spark (v3.5.5)

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @coderfender!

@andygrove andygrove changed the title fix: Fix Spark SQL AQE Tests fix: Fix Spark SQL AQE exchange reuse test failures Jun 2, 2025
@andygrove andygrove merged commit db7d59a into apache:main Jun 2, 2025
79 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[INTERNAL_ERROR] Custom columnar rules cannot transform shuffle node to something else
3 participants