feat(writer): Add clustered and fanout writer #1735

CTTY · 2025-10-10T04:04:56Z

Which issue does this PR close?

Closes Implement fanout partitioned data writer. #1572 Implement clustered partitioned data writer. #1573

What changes are included in this PR?

New:

Added new partitioning module with PartitioningWriter trait
ClusteredWriter: Optimized for pre-sorted data, requires writing in partition order
FanoutWriter: Flexible writer that can handle data from any partition at any time

Modification:

(BREAKING) Modified DataFileWriterBuilder to support dynamic partition assignment
Updated DataFusion integration to use the new writer API

Are these changes tested?

Added unit tests

CTTY · 2025-10-10T04:05:47Z

crates/iceberg/src/writer/mod.rs

-    /// Build the iceberg writer.
-    async fn build(self) -> Result<Self::R>;
+    /// Build the iceberg writer for an optional partition key.
+    async fn build_with_partition(self, partition_key: Option<PartitionKey>) -> Result<Self::R>;


This is a breaking change. I believe this is necessary because:

IcebergWriter is supposed to generate DataFile that always hold a partition value according to iceberg spec.

The existing code store partition value in the builder directly, making builder.clone() useless:

let builder = IcebergWriterBuilder::new(partition_A); let writer_A = builder.build(); ... // write to partition A // done with partition A and now we need to write to partition B // this is wrong because partition value A is still stored in the builder let writer_B = builder.clone().build()

An alternative is to add a new method clone_with_partition() but that would also be a breaking change and it's less clean compared to build_with_partition()

I'm fine with this change, but I want a further change as following:

async fn build(&self, partition_key: Option<PartitionKey>) -> Result<Self::R>

If the builder could be reused for creating actual IcebergWriter, I want to avoid cloning.

crates/iceberg/src/writer/partitioning/clustered_data_writer.rs

crates/iceberg/src/writer/partitioning/fanout_writer.rs

liurenjie1024

Thanks @CTTY for this pr! Just finished first round of review, and I think we are on the right track!

liurenjie1024 · 2025-10-14T09:44:59Z

crates/iceberg/src/writer/partitioning/mod.rs

+    /// # Returns
+    ///
+    /// `Ok(())` on success, or an error if the write operation fails.
+    async fn write(&mut self, partition_key: Option<PartitionKey>, input: I) -> Result<()>;


Suggested change

async fn write(&mut self, partition_key: Option<PartitionKey>, input: I) -> Result<()>;

async fn write(&mut self, partition_key: PartitionKey, input: I) -> Result<()>;

For partitioning writer, it should always be partitioned?

I was planning to have partitioning writer to take care of unpartitioned data as well, but I think you are right that we can have an explicit unpartitioned writer (basically a wrapper of iceberg writer) and have TaskWriter to decide which one to use.

will fix this

crates/iceberg/src/writer/partitioning/mod.rs

crates/iceberg/src/writer/partitioning/fanout_writer.rs

liurenjie1024 · 2025-10-14T10:00:19Z

crates/iceberg/src/writer/partitioning/clustered_writer.rs

+            let partition_value = partition_key.data();
+
+            // Check if this partition has been closed already
+            if self.closed_partitions.contains(partition_value) {


It's odd to add the check here. It's the caller's responsibility to ensure that inputs are sorted, but if it's not, we should not throw error.

I was referring to Java's behavior when writing this.

When looking at the original java PR I don't see any explicit explanation, but I think this can force users to be aware of if their data source is sorted and help identify hidden performance issues

I'm fine with following java's logic.

liurenjie1024 · 2025-10-14T10:04:21Z

crates/iceberg/src/writer/mod.rs

-    /// Build the iceberg writer.
-    async fn build(self) -> Result<Self::R>;
+    /// Build the iceberg writer for an optional partition key.
+    async fn build_with_partition(self, partition_key: Option<PartitionKey>) -> Result<Self::R>;


I'm fine with this change, but I want a further change as following:

async fn build(&self, partition_key: Option<PartitionKey>) -> Result<Self::R>

If the builder could be reused for creating actual IcebergWriter, I want to avoid cloning.

liurenjie1024 · 2025-10-15T09:37:29Z

crates/iceberg/src/writer/partitioning/fanout_writer.rs

+/// * `B` - The inner writer builder type
+/// * `I` - Input type (defaults to `RecordBatch`)
+/// * `O` - Output collection type (defaults to `Vec<DataFile>`)
+#[derive(Clone)]


Why we need this? Cloning a FanoutWriter is ambiguous, since it contains states like opened writers and data files.

liurenjie1024 · 2025-10-15T09:41:03Z

crates/iceberg/src/writer/partitioning/clustered_writer.rs

+            let partition_value = partition_key.data();
+
+            // Check if this partition has been closed already
+            if self.closed_partitions.contains(partition_value) {


I'm fine with following java's logic.

Add clustered and fanout writer

b44757e

CTTY commented Oct 10, 2025

View reviewed changes

crates/iceberg/src/writer/partitioning/clustered_data_writer.rs Outdated Show resolved Hide resolved

CTTY added 4 commits October 9, 2025 21:34

fix usages

36bac11

daily clippy fix

34917d4

Merge branch 'main' into ctty/parpar-new

97ed0ee

Merge branch 'main' into ctty/parpar-new

be808c9

CTTY commented Oct 13, 2025

View reviewed changes

crates/iceberg/src/writer/partitioning/fanout_writer.rs Show resolved Hide resolved

CTTY added 2 commits October 13, 2025 15:56

trying with generic IO

6251fc8

better naming for generic partitioning writer

19b421d

liurenjie1024 reviewed Oct 14, 2025

View reviewed changes

CTTY added 6 commits October 14, 2025 13:42

consume writer when closing

cd53509

partition key is a must

1f7af08

rename build_with_partition to build

ff77897

daily fmt fix

49f7197

Merge branch 'main' into ctty/parpar-new

402f072

fix doc

549d10d

liurenjie1024 reviewed Oct 15, 2025

View reviewed changes

	async fn write(&mut self, partition_key: Option<PartitionKey>, input: I) -> Result<()>;
	async fn write(&mut self, partition_key: PartitionKey, input: I) -> Result<()>;

feat(writer): Add clustered and fanout writer #1735

Are you sure you want to change the base?

feat(writer): Add clustered and fanout writer #1735

Uh oh!

Conversation

CTTY commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

CTTY Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

liurenjie1024 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CTTY Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CTTY commented Oct 10, 2025 •

edited

Loading

CTTY Oct 10, 2025 •

edited

Loading

CTTY Oct 14, 2025 •

edited

Loading