Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: [Docs] Clarify inlined strings in VariableLengthStringView is padded with 0 #40512

Merged
merged 1 commit into from
Mar 13, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Mar 13, 2024

Rationale for this change

While implementing Variable-size Binary View Layout (thanks @ariesdevil !) in apache/arrow-rs#5481 it was not 100% clear if the inlined string was zero padded.

@bkietz noted that

The spec does say "padded with zero" https://github.com/apache/arrow/blob/main/docs/source/format/Columnar.rst?plain=1#L384 but it could be repeated in the surrounding paragraph. In any case, padded with zero is definitely the intent

    * Short strings, length <= 12
      | Bytes 0-3  | Bytes 4-15                            |
      |------------|---------------------------------------|
      | length     | data (padded with 0)                  |

What changes are included in this PR?

Add a sentence in the surrounding text to make it clear the inlined strings values are zero padded

Note I do not think this is a specification change (and therefore doesn't need a vote on the mailing list) as the spec already specifies the padding is zero (in the diagram). This simply clarifies the text to emphasize this point for ease of understanding

Are these changes tested?

Are there any user-facing changes?

@alamb alamb changed the title [Minor] Clarify VariableLengthStringView padding [Minor] Clarify VariableLengthStringView padding is 0 Mar 13, 2024
@alamb alamb changed the title [Minor] Clarify VariableLengthStringView padding is 0 [Minor] Clarify inlined strings in VariableLengthStringView is padded with0 Mar 13, 2024
@alamb alamb changed the title [Minor] Clarify inlined strings in VariableLengthStringView is padded with0 [Minor] Clarify inlined strings in VariableLengthStringView is padded with0 Mar 13, 2024
@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Mar 13, 2024
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@kou kou changed the title [Minor] Clarify inlined strings in VariableLengthStringView is padded with0 MINOR: [Docs] Clarify inlined strings in VariableLengthStringView is padded with 0 Mar 13, 2024
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou merged commit bd3fab4 into apache:main Mar 13, 2024
8 of 11 checks passed
@kou kou removed the awaiting committer review Awaiting committer review label Mar 13, 2024
@github-actions github-actions bot added the awaiting merge Awaiting merge label Mar 13, 2024
@alamb alamb deleted the alamb/clarify_padding branch March 13, 2024 21:42
@alamb
Copy link
Contributor Author

alamb commented Mar 13, 2024

Thank you for the review @kou

Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit bd3fab4.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting merge Awaiting merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants