Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor SparkStrategy to fix Spark-4.0 build #12198

Merged
merged 10 commits into from
Feb 26, 2025

Conversation

nartal1
Copy link
Collaborator

@nartal1 nartal1 commented Feb 22, 2025

This contributes to #12062.
This fixes the spark-plugin-api module and also fixes some of the build failures in spark-plugin module which had reference to SparkStrategy. We import from SparkStrategy directly instead of an alias Strategy.

These failures are fixed now:

[ERROR] [Error] /home/nartal/spark-rapids-2504/spark-rapids/sql-plugin-api/src/main/scala/com/nvidia/spark/rapids/SQLExecPlugin.scala:19: object Strategy is not a member of package org.apache.spark.sql
[ERROR] [Error] /home/nartal/spark-rapids-2504/spark-rapids/sql-plugin-api/src/main/scala/com/nvidia/spark/rapids/SQLExecPlugin.scala:27: not found: type Strategy
[ERROR] [Error] /home/nartal/spark-rapids-2504/spark-rapids/sql-plugin-api/src/main/scala/com/nvidia/spark/rapids/ShimLoader.scala:30: object Strategy is not a member of package org.apache.spark.sql
[ERROR] [Error] /home/nartal/spark-rapids-2504/spark-rapids/sql-plugin-api/src/main/scala/com/nvidia/spark/rapids/ShimLoader.scala:354: not found: type Strategy
[ERROR] four errors found



[ERROR] [Error] /home/nartal/spark-rapids-2504/spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/StrategyRules.scala:21: object Strategy is not a member of package org.apache.spark.sql
[ERROR] [Error] /home/nartal/spark-rapids-2504/spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/StrategyRules.scala:30: not found: type Strategy
[ERROR] [Error] /home/nartal/spark-rapids-2504/spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/StrategyRules.scala:32: not found: type Strategy
[ERROR] [Error] /home/nartal/spark-rapids-2504/spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/delta/DeltaProvider.scala:21: object Strategy is not a member of package org.apache.spark.sql
[ERROR] [Error] /home/nartal/spark-rapids-2504/spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/delta/DeltaProvider.scala:44: not found: type Strategy
[ERROR] [Error] /home/nartal/spark-rapids-2504/spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/StrategyRules.scala:43: value nonEmpty is not a member of Nothing
[ERROR] [Error] /home/nartal/spark-rapids-2504/spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/StrategyRules.scala:32: private val strategies in class StrategyRules is never used
Applicable -Wconf / @nowarn filters for this fatal warning: msg=<part of the message>, cat=unused-privates, site=com.nvidia.spark.rapids.StrategyRules.strategies
[ERROR] [Error] /home/nartal/spark-rapids-2504/spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/delta/DeltaProvider.scala:100: not found: type Strategy

@nartal1 nartal1 added build Related to CI / CD or cleanly building Spark 4.0+ Spark 4.0+ issues labels Feb 22, 2025
@nartal1 nartal1 self-assigned this Feb 22, 2025
@nartal1 nartal1 requested a review from a team as a code owner February 22, 2025 00:58
@nartal1
Copy link
Collaborator Author

nartal1 commented Feb 22, 2025

build

@nartal1
Copy link
Collaborator Author

nartal1 commented Feb 23, 2025

build

@nartal1 nartal1 closed this Feb 24, 2025
@nartal1 nartal1 reopened this Feb 24, 2025
@nartal1 nartal1 changed the title [DO NOT REVIEW] Refactor SparkStrategy to fix Spark-4.0 build Refactor SparkStrategy to fix Spark-4.0 build Feb 24, 2025
@nartal1
Copy link
Collaborator Author

nartal1 commented Feb 24, 2025

build

Signed-off-by: Niranjan Artal <[email protected]>
@nartal1
Copy link
Collaborator Author

nartal1 commented Feb 24, 2025

build

import org.apache.spark.sql.catalyst.rules.Rule
import org.apache.spark.sql.execution.{ColumnarRule, SparkPlan}

/**
* Extension point to enable GPU SQL processing.
*/
class SQLExecPlugin extends (SparkSessionExtensions => Unit) {
class SQLExecPlugin extends (SparkSessionExtensions => Unit) with ConnectShims {
Copy link
Collaborator

@gerashegalov gerashegalov Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine either way, but I think the files will be changed to a smaller extent and more uniformly if we pulled in the type alias via an import

import  com.nvidia.spark.rapids.ConnectShims._

instead of requiring with/extends updates

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Gera! I will update it. I was wondering if you had any preference for object vs trait? For SparkSession I am planning to create it as an object in the next PR. It needs to be shimmed and not in common code:

import org.apache.spark.sql.classic.SparkSession


object TrampolineConnectShims {
  type SparkSession = org.apache.spark.sql.classic.SparkSession

  def cleanupAnyExistingSession(): Unit = SparkSession.cleanupAnyExistingSession()
}

So for SparkStrategy, is keeping it as trait okay ? (the reason I was thinking of keeping it as object is we just declare type in the trait and actually don't override any functions).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is to import the type alias via an object.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have updated the PR to import the type alias using an object.

@nartal1
Copy link
Collaborator Author

nartal1 commented Feb 25, 2025

build

@nartal1
Copy link
Collaborator Author

nartal1 commented Feb 25, 2025

build

@@ -0,0 +1,21 @@
/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this class be shimmed and have two versions, one for Spark 4.0.0 and the other for the rest of the shims?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I added this in sql-plugin-api, I just kept it in one file as we should not be shimming anything in sql-plugin-api.
For the SparkSession build fix PR-#12227 I did add shims for Spark-4.0 and rest of them. Please let me know if that's okay.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the confusion is that Strategy was just an alias to SparkStrategy and upgrade to Spark 4 only affects the alias. So let us try to simply use SparkStrategy directly.

@nartal1
Copy link
Collaborator Author

nartal1 commented Feb 26, 2025

build

@nartal1 nartal1 merged commit fffd82c into NVIDIA:branch-25.04 Feb 26, 2025
50 of 52 checks passed
Copy link
Collaborator

@razajafri razajafri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Related to CI / CD or cleanly building Spark 4.0+ Spark 4.0+ issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants