Improve append #46

JanKaul · 2024-10-24T06:39:09Z

This PR improves the Append Operation in that it selects only a single manifest file to append the new datafiles to. In case the number of datafiles exceeds a certain threshold, the manifest file is split into smaller manifest files.

This should improve insert performance, as well as the overall structure of the manifest_list/manifest tree.

iceberg-rust/src/util/mod.rs

iceberg-rust/src/table/transaction/append.rs

iceberg-rust/Cargo.toml

iceberg-rust/src/util/mod.rs

iceberg-rust/src/table/transaction/append.rs

iceberg-rust/Cargo.toml

iceberg-rust/src/table/transaction/append.rs

rdettai · 2024-10-29T09:53:42Z

iceberg-rust/src/table/transaction/append.rs

+}
+
+/// Splits the datafiles *n_split* times to decrease the number of datafiles per maniefst. 1 split returns 2 outputs vectors, 2 splits return 4, 3 splits return 8 and so on.
+pub(crate) fn split_datafiles(


Ideally, this should be unit tested as well.

Creating mocked data for a unit test is quite difficult here. I expanded a datafusion test to cover the manifest splitting case.

iceberg-rust-spec/src/spec/values.rs

iceberg-rust/src/util/mod.rs

iceberg-rust-spec/src/spec/values.rs

rdettai · 2024-10-29T10:50:49Z

iceberg-rust/src/table/transaction/operation.rs

Operation.execute() is too big, which makes it hard to read the change. Could you refactor it?

datafusion_iceberg/src/table.rs

Improve append

JanKaul added 5 commits October 23, 2024 17:10

find manifest

95e59d4

fix compiler errors

79a2b92

fix merge issues

145c5bc

improve rewrite operation

78f588f

fix clippy warnings

24ab564

rdettai reviewed Oct 24, 2024

View reviewed changes

iceberg-rust/src/util/mod.rs Show resolved Hide resolved

JanKaul added 3 commits October 24, 2024 10:50

fix out of bound bug in sub

2f8b7d2

fix_cmp _dist bug

ec1aefc

add tests for contains

9a57a75

rdettai reviewed Oct 24, 2024

View reviewed changes

JanKaul added 11 commits October 24, 2024 15:26

tests for cmp_with_priority

71c3e93

expand with node test

3b14ef9

test expand rectangle

b09e813

remove generic from rectangle

db15ad8

fix partition column type

7725fde

fix public function

037b982

fix identity

98d497d

rename struct_to_smallvec

5f07a9a

add documentation

023d547

implement sub for string

e050686

clippy fixes

dc7a29d

rdettai reviewed Oct 26, 2024

View reviewed changes

iceberg-rust/src/table/transaction/append.rs Outdated Show resolved Hide resolved

iceberg-rust/Cargo.toml Show resolved Hide resolved

JanKaul added 2 commits October 27, 2024 06:46

add documentation

e715e2c

expand documentation

49cb036

rdettai reviewed Oct 29, 2024

View reviewed changes

JanKaul added 5 commits October 29, 2024 15:48

remove public from split_datafiles_once

12af6d7

use normal vec for split_datafiles

76de2f8

remove tryadd

93b768d

implement trysub for uuid and fixed

e677ec6

add assert for splitting manifest

97927a5

JanKaul and others added 7 commits October 29, 2024 17:43

select manifest

b5ca169

create new manifest-writer

7b52090

refactor manifest-writer

78596fa

fix clippy warnings

4cc2bff

Refactor and fix select_manifest

f3532f9

Refactor n_split computation

12de20a

Merge pull request #47 from rdettai/improve-append

7861b35

Improve append

JanKaul merged commit f2fff11 into main Oct 31, 2024
1 check passed

JanKaul deleted the improve-append branch October 31, 2024 16:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve append #46

Improve append #46

JanKaul commented Oct 24, 2024

rdettai Oct 29, 2024

JanKaul Oct 29, 2024

rdettai Oct 29, 2024

Improve append #46

Improve append #46

Conversation

JanKaul commented Oct 24, 2024

rdettai Oct 29, 2024

Choose a reason for hiding this comment

JanKaul Oct 29, 2024

Choose a reason for hiding this comment

rdettai Oct 29, 2024

Choose a reason for hiding this comment