Skip to content

Commit d9173dd

Browse files
committed
[SPARK-47618][CORE] Use Magic Committer for all S3 buckets by default
1 parent 394eebd commit d9173dd

File tree

2 files changed

+6
-10
lines changed

2 files changed

+6
-10
lines changed

core/src/main/scala/org/apache/spark/SparkContext.scala

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -424,7 +424,7 @@ class SparkContext(config: SparkConf) extends Logging {
424424
throw new SparkException("An application name must be set in your configuration")
425425
}
426426
// This should be set as early as possible.
427-
SparkContext.fillMissingMagicCommitterConfsIfNeeded(_conf)
427+
SparkContext.enableMagicCommitterIfNeeded(_conf)
428428

429429
SparkContext.supplementJavaModuleOptions(_conf)
430430
SparkContext.supplementJavaIPv6Options(_conf)
@@ -3378,16 +3378,10 @@ object SparkContext extends Logging {
33783378
}
33793379

33803380
/**
3381-
* This is a helper function to complete the missing S3A magic committer configurations
3382-
* based on a single conf: `spark.hadoop.fs.s3a.bucket.<bucket>.committer.magic.enabled`
3381+
* Enable Magic Committer by default for all S3 buckets if hadoop-cloud module exists.
33833382
*/
3384-
private def fillMissingMagicCommitterConfsIfNeeded(conf: SparkConf): Unit = {
3385-
val magicCommitterConfs = conf
3386-
.getAllWithPrefix("spark.hadoop.fs.s3a.bucket.")
3387-
.filter(_._1.endsWith(".committer.magic.enabled"))
3388-
.filter(_._2.equalsIgnoreCase("true"))
3389-
if (magicCommitterConfs.nonEmpty) {
3390-
// Try to enable S3 magic committer if missing
3383+
private def enableMagicCommitterIfNeeded(conf: SparkConf): Unit = {
3384+
if (Utils.classIsLoadable("org.apache.spark.internal.io.cloud.PathOutputCommitProtocol")) {
33913385
conf.setIfMissing("spark.hadoop.fs.s3a.committer.magic.enabled", "true")
33923386
if (conf.get("spark.hadoop.fs.s3a.committer.magic.enabled").equals("true")) {
33933387
conf.setIfMissing("spark.hadoop.fs.s3a.committer.name", "magic")

docs/core-migration-guide.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ license: |
2424

2525
## Upgrading from Core 3.5 to 4.0
2626

27+
- Since Spark 4.0, Spark uses Apache Hadoop Magic Committer for all S3 buckets by default. To restore the behavior before Spark 4.0, you can set `spark.hadoop.fs.s3a.bucket.*.committer.magic.enabled=false`.
28+
2729
- Since Spark 4.0, Spark migrated all its internal reference of servlet API from `javax` to `jakarta`
2830

2931
- Since Spark 4.0, Spark will roll event logs to archive them incrementally. To restore the behavior before Spark 4.0, you can set `spark.eventLog.rolling.enabled` to `false`.

0 commit comments

Comments
 (0)