[SPARK-51537][CONNECT][CORE] construct the session-specific classloader based on the default session classloader on executor #50334

wbo4958 · 2025-03-20T09:00:52Z

What changes were proposed in this pull request?

This PR is to construct the session-specific classloader based on the default session classloader which has already added the global jars (e.g., added by --jars ) into the classpath on the executor side in the connect mode.

Why are the changes needed?

In Spark Connect mode, when connecting to a non-local (e.g., standalone) cluster, the executor creates an isolated session state that includes a session-specific classloader for each task. However, a notable issue arises: this session-specific classloader does not include the global JARs specified by the --jars option in the classpath. This oversight can lead to deserialization exceptions. For example:

Caused by: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2096)

Does this PR introduce any user-facing change?

No

How was this patch tested?

The newly added test can pass. And the below manual test can pass,

clone the minimum project that could repro this issue

git clone [email protected]:wbo4958/ConnectMLIssue.git

Compile the project

mvn clean package

Start a standalone cluster

$SPARK_HOME/sbin/start-master.sh -h localhost
$SPARK_HOME/sbin/start-worker.sh spark://localhost:7077

Start a connect server connecting to the spark standalone cluster

./standalone.sh

Play around the demo

Running the below code under the pyspark client environment.

python repro-issue.py

Without this PR, you're going to see the below exception

Caused by: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD
	at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2096)
	at java.io.ObjectStreamClass$FieldReflector.checkObjectFieldValueTypes(ObjectStreamClass.java:2060)
	at java.io.ObjectStreamClass.checkObjFieldValueTypes(ObjectStreamClass.java:1347)
	at java.io.ObjectInputStream$FieldValues.defaultCheckFieldValues(ObjectInputStream.java:2679)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2486)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2257)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1733)
	at java.io.ObjectInputStream$FieldValues.<init>(ObjectInputStream.java:2606)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2457)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2257)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1733)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:509)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:467)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:88)
	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:136)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:86)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171)
	at org.apache.spark.scheduler.Task.run(Task.scala:147)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:645)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:80)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:77)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:100)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:648)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.lang.Thread.run(Thread.java:840)

Was this patch authored or co-authored using generative AI tooling?

No

… the session specific jars

wbo4958 · 2025-03-21T05:57:11Z

Hi @hvanhovell @zhenlineo @HyukjinKwon @vicennial, Could you help review this PR? Thx very much.

vicennial · 2025-03-25T11:09:14Z

Thanks for identifying this issue, @wbo4958! While your PR resolves the executor-side problem, I believe we have a chance to refine our approach to cover both executor operations (e.g., typical UDFs) and driver operations (e.g., custom data sources) in one unified solution.

The high-level proposal: In the ArtifactManager, add an initialisation step that would copy JARs from the underlying session.sparkContext.addedJars(DEFAULT_SESSION_ID) into session.sparkContext.addedJars(session.sessionUUID).
Advantages:

Enhanced session isolation
- Global JARs are copied during initialization, so any subsequent changes to the default session jars do not affect the session-specific context.
- This isolation is particularly beneficial in standalone clusters where Spark Connect sessions coexist with traditional sessions (i.e., those interacting directly with SparkContext).
Since the copied global JARs behave as session-scoped JARs, no extra modifications to the executor’s code or classloader are required.

The negative here is duplicating the global JARs for each new Spark Connect session will naturally consume more resources. We could mitigate this by adding a Spark configuration option to toggle whether global jars are inherited into a Spark Connect session.

WDYT?

github-actions bot added the CORE label Mar 20, 2025

wbo4958 changed the title ~~[SPARK-51537][CONNECT] [constructed classpath using both global jars and session specific jars in executor~~ [SPARK-51537][CONNECT][CORE] [constructed classpath using both global jars and session specific jars in executor Mar 20, 2025

wbo4958 force-pushed the connect-executor-classpath branch from 7b963dc to bbe2a94 Compare March 21, 2025 02:17

wbo4958 changed the title ~~[SPARK-51537][CONNECT][CORE] [constructed classpath using both global jars and session specific jars in executor~~ [SPARK-51537][CONNECT][CORE] construct classpath using both global jars and session specific jars in executor Mar 21, 2025

wbo4958 marked this pull request as ready for review March 21, 2025 02:43

wbo4958 changed the title ~~[SPARK-51537][CONNECT][CORE] construct classpath using both global jars and session specific jars in executor~~ [SPARK-51537][CONNECT][CORE] construct classpath using both global jars and session specific jars on executor Mar 21, 2025

gerashegalov mentioned this pull request Mar 21, 2025

[BUG] Queries via Connect on Multinode Databricks 14.3 fail with ClassCastException on the executor NVIDIA/spark-rapids#11902

Open

wbo4958 marked this pull request as draft March 25, 2025 06:55

fix plugin issue

Loading
Loading status checks…

cfc8047

wbo4958 marked this pull request as ready for review March 25, 2025 08:01

wbo4958 changed the title ~~[SPARK-51537][CONNECT][CORE] construct classpath using both global jars and session specific jars on executor~~ [SPARK-51537][CONNECT][CORE] construct the session-specific classloader based on the default session classloader on executor Mar 25, 2025

add unit test

Loading
Loading status checks…

83c639d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-51537][CONNECT][CORE] construct the session-specific classloader based on the default session classloader on executor #50334

[SPARK-51537][CONNECT][CORE] construct the session-specific classloader based on the default session classloader on executor #50334

wbo4958 commented Mar 20, 2025 •

edited

Loading

wbo4958 commented Mar 21, 2025

vicennial commented Mar 25, 2025

[SPARK-51537][CONNECT][CORE] construct the session-specific classloader based on the default session classloader on executor #50334

Are you sure you want to change the base?

[SPARK-51537][CONNECT][CORE] construct the session-specific classloader based on the default session classloader on executor #50334

Conversation

wbo4958 commented Mar 20, 2025 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

wbo4958 commented Mar 21, 2025

vicennial commented Mar 25, 2025

wbo4958 commented Mar 20, 2025 •

edited

Loading