[improve] extract field in value that already contains in key #1825

leaves12138 · 2023-08-15T14:58:16Z

The key-value storage (in primary-key table) are redundant.
Now, the format of a row is: _KEY_a, _KEY_b, _FIELD_SEQUENCE, _ROW_KIND, a, b, c, d, e, f, g
The column "a" and "b" have double restored. A optimization is:

convert    _KEY_a, _KEY_b, _FIELD_SEQUENCE, _ROW_KIND, a, b, c, d, e, f, g

to         _KEY_a, _KEY_b, _FIELD_SEQUENCE, _ROW_KIND, c, d, e, f, g

While reading, we find the a, b from key and fill them to value.

This PR is not complete, unit tests and compatibility test will need to be done.

JingsongLi · 2023-08-15T15:09:16Z

paimon-core/src/main/java/org/apache/paimon/io/KeyValueFileWriterFactory.java

@@ -164,11 +165,14 @@ private Builder(
        public KeyValueFileWriterFactory build(
                BinaryRow partition, int bucket, CoreOptions options) {
            RowType fileRowType = KeyValue.schema(keyType, valueType);
+            RowType storageRowType = RowTypeUtils.toStorageRowType(fileRowType);


Here we need an option, by default, we can use old style row type. Because we need to be as compatible with old readers as possible.

Then wait for 1-2 versions, and switch to the default mode to thin mode.

[improve] extract field in value that already contains in key

7f19d83

JingsongLi reviewed Aug 15, 2023

View reviewed changes

[fix] fix comment

e797327

leaves12138 marked this pull request as draft December 28, 2023 09:42

JingsongLi mentioned this pull request Feb 26, 2024

[Feature] Don't store redundant primary key columns #2893

Closed

2 tasks

JingsongLi closed this Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[improve] extract field in value that already contains in key #1825

[improve] extract field in value that already contains in key #1825

leaves12138 commented Aug 15, 2023 •

edited

Loading

JingsongLi Aug 15, 2023

[improve] extract field in value that already contains in key #1825

[improve] extract field in value that already contains in key #1825

Conversation

leaves12138 commented Aug 15, 2023 • edited Loading

JingsongLi Aug 15, 2023

Choose a reason for hiding this comment

leaves12138 commented Aug 15, 2023 •

edited

Loading