[SPARK-47618][CORE] Use Magic Committer for all S3 buckets by default #51010

dongjoon-hyun · 2025-05-24T19:57:18Z

What changes were proposed in this pull request?

This PR aims to use Apache Hadoop Magic Committer for all S3 buckets by default in Apache Spark 4.1.0.

Why are the changes needed?

Apache Hadoop Magic Committer has been used for S3 buckets to get the best performance since S3 became fully consistent on December 1st, 2020.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html#ConsistencyModel

Amazon S3 provides strong read-after-write consistency for PUT and DELETE requests of objects in your Amazon S3 bucket in all AWS Regions. This behavior applies to both writes to new objects as well as PUT requests that overwrite existing objects and DELETE requests. In addition, read operations on Amazon S3 Select, Amazon S3 access controls lists (ACLs), Amazon S3 Object Tags, and object metadata (for example, the HEAD object) are strongly consistent.

Does this PR introduce any user-facing change?

Yes, the migration guide is updated.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

dongjoon-hyun · 2025-05-24T20:02:26Z

cc @viirya , @yaooqinn , @peter-toth

[SPARK-47618] Use Magic Committer for all S3 buckets by default

1b62fd8

github-actions bot added DOCS CORE labels May 24, 2025

dongjoon-hyun mentioned this pull request May 24, 2025

[SPARK-47618][CORE] Use Magic Committer for all S3 buckets by default #45740

Closed

dongjoon-hyun changed the title ~~[SPARK-47618] Use Magic Committer for all S3 buckets by default~~ [SPARK-47618][CORE] Use Magic Committer for all S3 buckets by default May 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-47618][CORE] Use Magic Committer for all S3 buckets by default #51010

[SPARK-47618][CORE] Use Magic Committer for all S3 buckets by default #51010

dongjoon-hyun commented May 24, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun commented May 24, 2025

Uh oh!

Uh oh!

[SPARK-47618][CORE] Use Magic Committer for all S3 buckets by default #51010

Are you sure you want to change the base?

[SPARK-47618][CORE] Use Magic Committer for all S3 buckets by default #51010

Conversation

dongjoon-hyun commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun commented May 24, 2025

Uh oh!

Uh oh!

dongjoon-hyun commented May 24, 2025 •

edited

Loading