-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-37876: [Format] Add list-view specification to arrow format #37877
Conversation
|
3fb1d4e
to
f384285
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nits in wording, otherwise looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Co-authored-by: David Li <[email protected]>
Co-authored-by: Benjamin Kietzman <[email protected]>
dd6ed5f
to
d88e00a
Compare
At least in Rust the rule is that a slice must have an end index less than or equal to the length of the data being sliced. So in this case a slice would be valid iff It has been a while since I worked in C++, but if I recall correctly this is consistent with the way iterators work as well. |
I will rewrite the text saying that non-empty nulls are allowed, then. |
LGTM |
docs/source/format/Columnar.rst
Outdated
@@ -100,15 +100,15 @@ Arrays are defined by a few pieces of metadata and data: | |||
Nested arrays additionally have a sequence of one or more sets of | |||
these items, called the **child arrays**. | |||
|
|||
Each logical data type has a well-defined physical layout. Here are | |||
the different physical layouts defined by Arrow: | |||
Each logical data type has one or more well-defined physical layouts. Here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep the singular. There is no disjunction in Arrow (unlike Parquet) between "logical" data type and physical layout. ListView and StringView are simply distinct types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change this back to singular and all the other places I've changed it. But in the future, the "logical data type" terminology should probably be removed altogether because it's very confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely agree with that. The spec was often confusing to me at the start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @felipecrv !
@bkietz Any other comment? |
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 6d551aa. There was 1 benchmark result indicating a performance regression:
The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…pache#37877) ### Rationale for this change More details in the draft implementations of this spec: - C++: apache#35345 - Go: apache#37468 ### What changes are included in this PR? - Some unrelated fixes to the spec text (I can extract these to another PR if necessary) - Changes to the spec text - Additions to the Flatbuffers specifications of the Arrow format ### Are these changes tested? N/A. ### Are there any user-facing changes? Changes in documentation and backwards compatible additions to the format spec. * Closes: apache#37876 Lead-authored-by: Felipe Oliveira Carvalho <[email protected]> Co-authored-by: David Li <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Signed-off-by: Matt Topol <[email protected]>
…pache#37877) ### Rationale for this change More details in the draft implementations of this spec: - C++: apache#35345 - Go: apache#37468 ### What changes are included in this PR? - Some unrelated fixes to the spec text (I can extract these to another PR if necessary) - Changes to the spec text - Additions to the Flatbuffers specifications of the Arrow format ### Are these changes tested? N/A. ### Are there any user-facing changes? Changes in documentation and backwards compatible additions to the format spec. * Closes: apache#37876 Lead-authored-by: Felipe Oliveira Carvalho <[email protected]> Co-authored-by: David Li <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Signed-off-by: Matt Topol <[email protected]>
…pache#37877) ### Rationale for this change More details in the draft implementations of this spec: - C++: apache#35345 - Go: apache#37468 ### What changes are included in this PR? - Some unrelated fixes to the spec text (I can extract these to another PR if necessary) - Changes to the spec text - Additions to the Flatbuffers specifications of the Arrow format ### Are these changes tested? N/A. ### Are there any user-facing changes? Changes in documentation and backwards compatible additions to the format spec. * Closes: apache#37876 Lead-authored-by: Felipe Oliveira Carvalho <[email protected]> Co-authored-by: David Li <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Signed-off-by: Matt Topol <[email protected]>
Rationale for this change
More details in the draft implementations of this spec:
What changes are included in this PR?
Are these changes tested?
N/A.
Are there any user-facing changes?
Changes in documentation and backwards compatible additions to the format spec.