Skip to content

Commit dd36b10

Browse files
committed
[SPARK-47618][CORE] Use Magic Committer for all S3 buckets by default
1 parent 394eebd commit dd36b10

File tree

2 files changed

+6
-0
lines changed

2 files changed

+6
-0
lines changed

core/src/main/scala/org/apache/spark/SparkContext.scala

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -423,6 +423,10 @@ class SparkContext(config: SparkConf) extends Logging {
423423
if (!_conf.contains("spark.app.name")) {
424424
throw new SparkException("An application name must be set in your configuration")
425425
}
426+
// Enable Magic Committer by default for all S3 buckets if hadoop-cloud module exists
427+
if (Utils.classIsLoadable("org.apache.spark.internal.io.cloud.PathOutputCommitProtocol")) {
428+
conf.setIfMissing("spark.hadoop.fs.s3a.bucket.*.committer.magic.enabled", "true")
429+
}
426430
// This should be set as early as possible.
427431
SparkContext.fillMissingMagicCommitterConfsIfNeeded(_conf)
428432

docs/core-migration-guide.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ license: |
2424

2525
## Upgrading from Core 3.5 to 4.0
2626

27+
- Since Spark 4.0, Spark uses Apache Hadoop Magic Committer for all S3 buckets by default. To restore the behavior before Spark 4.0, you can set `spark.hadoop.fs.s3a.bucket.*.committer.magic.enabled=false`.
28+
2729
- Since Spark 4.0, Spark migrated all its internal reference of servlet API from `javax` to `jakarta`
2830

2931
- Since Spark 4.0, Spark will roll event logs to archive them incrementally. To restore the behavior before Spark 4.0, you can set `spark.eventLog.rolling.enabled` to `false`.

0 commit comments

Comments
 (0)