Skip to content

Commit 22f1905

Browse files
committed
[SPARK-51407][CONNECT][DOCS] Document missed Spark Connect configurations
### What changes were proposed in this pull request? This PR aims to documents the missed `spark.connect.*` configurations by syncing. - sql/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala - docs/configuration.md ### Why are the changes needed? **Apache Spark 3.5.0** - spark.connect.jvmStacktrace.maxSize - spark.sql.connect.ui.retainedSessions - spark.sql.connect.ui.retainedStatements **Apache Spark 4.0.0** - spark.connect.grpc.binding.address - spark.connect.grpc.port.maxRetries - spark.connect.ml.backend.classes - spark.sql.connect.enrichError.enabled - spark.sql.connect.serverStacktrace.enabled - spark.connect.grpc.maxMetadataSize - spark.connect.progress.reportInterval ### Does this PR introduce _any_ user-facing change? This updates only a config documentation and `configuration` HTML page. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50171 from dongjoon-hyun/SPARK-51407. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent a30bdc3 commit 22f1905

File tree

2 files changed

+82
-0
lines changed

2 files changed

+82
-0
lines changed

docs/configuration.md

+80
Original file line numberDiff line numberDiff line change
@@ -3398,6 +3398,14 @@ They are typically set via the config file and command-line options with `--conf
33983398
<td>For Spark Classic applications, specify whether to automatically use Spark Connect by running a local Spark Connect server. The value can be <code>classic</code> or <code>connect</code>.</td>
33993399
<td>4.0.0</td>
34003400
</tr>
3401+
<tr>
3402+
<td><code>spark.connect.grpc.binding.address</code></td>
3403+
<td>
3404+
(none)
3405+
</td>
3406+
<td>Address for Spark Connect server to bind.</td>
3407+
<td>4.0.0</td>
3408+
</tr>
34013409
<tr>
34023410
<td><code>spark.connect.grpc.binding.port</code></td>
34033411
<td>
@@ -3406,6 +3414,14 @@ They are typically set via the config file and command-line options with `--conf
34063414
<td>Port for Spark Connect server to bind.</td>
34073415
<td>3.4.0</td>
34083416
</tr>
3417+
<tr>
3418+
<td><code>spark.connect.grpc.port.maxRetries</code></td>
3419+
<td>
3420+
0
3421+
</td>
3422+
<td>The max port retry attempts for the gRPC server binding. By default, it's set to 0, and the server will fail fast in case of port conflicts.</td>
3423+
<td>4.0.0</td>
3424+
</tr>
34093425
<tr>
34103426
<td><code>spark.connect.grpc.interceptor.classes</code></td>
34113427
<td>
@@ -3459,6 +3475,70 @@ Expression types in proto.</td>
34593475
Command types in proto.</td>
34603476
<td>3.4.0</td>
34613477
</tr>
3478+
<tr>
3479+
<td><code>spark.connect.ml.backend.classes</code></td>
3480+
<td>
3481+
Comma separated list of classes that implement the trait org.apache.spark.sql.connect.plugin.MLBackendPlugin to replace the specified Spark ML operators with a backend-specific implementation.
3482+
</td>
3483+
<td>Comma separated list of class names that must implement the <code>io.grpc.ServerInterceptor</code> interface</td>
3484+
<td>4.0.0</td>
3485+
</tr>
3486+
<tr>
3487+
<td><code>spark.connect.jvmStacktrace.maxSize</code></td>
3488+
<td>
3489+
1024
3490+
</td>
3491+
<td>Sets the maximum stack trace size to display when `spark.sql.pyspark.jvmStacktrace.enabled` is true.</td>
3492+
<td>3.5.0</td>
3493+
</tr>
3494+
<tr>
3495+
<td><code>spark.sql.connect.ui.retainedSessions</code></td>
3496+
<td>
3497+
200
3498+
</td>
3499+
<td>The number of client sessions kept in the Spark Connect UI history.</td>
3500+
<td>3.5.0</td>
3501+
</tr>
3502+
<tr>
3503+
<td><code>spark.sql.connect.ui.retainedStatements</code></td>
3504+
<td>
3505+
200
3506+
</td>
3507+
<td>The number of statements kept in the Spark Connect UI history.</td>
3508+
<td>3.5.0</td>
3509+
</tr>
3510+
<tr>
3511+
<td><code>spark.sql.connect.enrichError.enabled</code></td>
3512+
<td>
3513+
true
3514+
</td>
3515+
<td>When true, it enriches errors with full exception messages and optionally server-side stacktrace on the client side via an additional RPC.</td>
3516+
<td>4.0.0</td>
3517+
</tr>
3518+
<tr>
3519+
<td><code>spark.sql.connect.serverStacktrace.enabled</code></td>
3520+
<td>
3521+
true
3522+
</td>
3523+
<td>When true, it sets the server-side stacktrace in the user-facing Spark exception.</td>
3524+
<td>4.0.0</td>
3525+
</tr>
3526+
<tr>
3527+
<td><code>spark.connect.grpc.maxMetadataSize</code></td>
3528+
<td>
3529+
1024
3530+
</td>
3531+
<td>Sets the maximum size of metadata fields. For instance, it restricts metadata fields in `ErrorInfo`.</td>
3532+
<td>4.0.0</td>
3533+
</tr>
3534+
<tr>
3535+
<td><code>spark.connect.progress.reportInterval</code></td>
3536+
<td>
3537+
2s
3538+
</td>
3539+
<td>The interval at which the progress of a query is reported to the client. If the value is set to a negative value the progress reports will be disabled.</td>
3540+
<td>4.0.0</td>
3541+
</tr>
34623542
</table>
34633543

34643544
### Security

sql/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala

+2
Original file line numberDiff line numberDiff line change
@@ -29,12 +29,14 @@ object Connect {
2929

3030
val CONNECT_GRPC_BINDING_ADDRESS =
3131
buildStaticConf("spark.connect.grpc.binding.address")
32+
.doc("The address for Spark Connect server to bind.")
3233
.version("4.0.0")
3334
.stringConf
3435
.createOptional
3536

3637
val CONNECT_GRPC_BINDING_PORT =
3738
buildStaticConf("spark.connect.grpc.binding.port")
39+
.doc("The port for Spark Connect server to bind.")
3840
.version("3.4.0")
3941
.intConf
4042
.createWithDefault(ConnectCommon.CONNECT_GRPC_BINDING_PORT)

0 commit comments

Comments
 (0)