Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve] extract field in value that already contains in key #1825

Closed
wants to merge 2 commits into from

Conversation

leaves12138
Copy link
Contributor

@leaves12138 leaves12138 commented Aug 15, 2023

The key-value storage (in primary-key table) are redundant.
Now, the format of a row is: _KEY_a, _KEY_b, _FIELD_SEQUENCE, _ROW_KIND, a, b, c, d, e, f, g
The column "a" and "b" have double restored. A optimization is:

convert    _KEY_a, _KEY_b, _FIELD_SEQUENCE, _ROW_KIND, a, b, c, d, e, f, g

to         _KEY_a, _KEY_b, _FIELD_SEQUENCE, _ROW_KIND, c, d, e, f, g

While reading, we find the a, b from key and fill them to value.

This PR is not complete, unit tests and compatibility test will need to be done.

@@ -164,11 +165,14 @@ private Builder(
public KeyValueFileWriterFactory build(
BinaryRow partition, int bucket, CoreOptions options) {
RowType fileRowType = KeyValue.schema(keyType, valueType);
RowType storageRowType = RowTypeUtils.toStorageRowType(fileRowType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we need an option, by default, we can use old style row type. Because we need to be as compatible with old readers as possible.

Then wait for 1-2 versions, and switch to the default mode to thin mode.

@leaves12138 leaves12138 marked this pull request as draft December 28, 2023 09:42
@JingsongLi JingsongLi closed this Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants