-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats #35345
Conversation
|
3204c80
to
5b0944c
Compare
90ce26e
to
f3a325a
Compare
06ca3f2
to
2c21e52
Compare
b4c6992
to
5e3a24b
Compare
… by the final list-view spec
…equired by the final list-view spec
return rag_.ArrayOf(std::move(type), size, null_probability); | ||
} | ||
|
||
// TODO(GH-38656): Use the random array generators from testing/random.h here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou I isolated all the random-generation code in this class and removed the complicated List[View]ConcatenationChecker
templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for finding two more nits. Feel free to ping when done!
cpp/src/arrow/array/concatenate.cc
Outdated
if (sizes[position] > 0) { | ||
// NOTE: Concatenate can be called during IPC reads to append delta | ||
// dictionaries. Avoid UB on non-validated input by doing the addition in the | ||
// unsigned domain. (the result can later be validated using | ||
// Array::ValidateFull) | ||
const auto displaced_offset = SafeSignedAdd(offsets[position], displacement); | ||
// displaced_offset>=0 is guaranteed by RangeOfValuesUsed returning the | ||
// smallest offset of valid and non-empty list-views. | ||
DCHECK_GE(displaced_offset, 0); | ||
dst[position] = displaced_offset; | ||
} else { | ||
// Do nothing to leave dst[position] as 0. | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be misreading, but is it just the same as visit_not_null(i)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I extracted the function from below when I noticed the dup, but forgot to do the reverse-inlining above.
Pushing soon.
…equired by the final list-view spec
@felipecrv We'll want to update https://github.com/apache/arrow/blob/main/docs/source/status.rst in a followup PR. |
I will be extremely glad to send that PR. |
bravo 🍺! |
I've create an issue about parquet. #38849 |
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 8cc71ab. There were 5 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 14 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…pache#37877) ### Rationale for this change More details in the draft implementations of this spec: - C++: apache#35345 - Go: apache#37468 ### What changes are included in this PR? - Some unrelated fixes to the spec text (I can extract these to another PR if necessary) - Changes to the spec text - Additions to the Flatbuffers specifications of the Arrow format ### Are these changes tested? N/A. ### Are there any user-facing changes? Changes in documentation and backwards compatible additions to the format spec. * Closes: apache#37876 Lead-authored-by: Felipe Oliveira Carvalho <[email protected]> Co-authored-by: David Li <[email protected]> Co-authored-by: Antoine Pitrou <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Signed-off-by: Matt Topol <[email protected]>
…E_LIST_VIEW array formats (apache#37468) ### Rationale for this change Go implementation of apache#35345. ### What changes are included in this PR? - [x] Add `LIST_VIEW` and `LARGE_LIST_VIEW` to datatype.go - [x] Add `ListView` and `LargeListView` to list.go - [x] Add `ListViewType` and `LargeListViewType` to datatype_nested.go - [x] Add list-view builders - [x] Implement list-view comparison in compare.go - [x] String conversion in both directions - [x] Validation of list-view arrays - [x] Generation of random list-view arrays - [x] Concatenation of list-view arrays in concat.go - [x] JSON serialization/deserialization - [x] Add data used for tests in `arrdata.go` - [x] Add Flatbuffer changes - [x] Add IPC support ### Are these changes tested? Yes. Existing tests are being changed to also cover list-view variations as well as new tests focused solely on the list-view format. ### Are there any user-facing changes? New structs and functions introduced. * Closes: apache#35344 Authored-by: Felipe Oliveira Carvalho <[email protected]> Signed-off-by: Matt Topol <[email protected]>
…GE_LIST_VIEW array formats (apache#35345) ### Rationale for this change Mailing list discussion: https://lists.apache.org/thread/r28rw5n39jwtvn08oljl09d4q2c1ysvb ### What changes are included in this PR? Initial implementation of the new format in C++. ### Are these changes tested? Unit tests being written on every commit adding new functionality. More needs to be implemented for Integration Tests (required) to be implementable. ### Are there any user-facing changes? A new array format. It should have no impact for users that don't use it. * Closes: apache#35344 Authored-by: Felipe Oliveira Carvalho <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Rationale for this change
Mailing list discussion: https://lists.apache.org/thread/r28rw5n39jwtvn08oljl09d4q2c1ysvb
What changes are included in this PR?
Initial implementation of the new format in C++.
Are these changes tested?
Unit tests being written on every commit adding new functionality. More needs to be implemented for Integration Tests (required) to be implementable.
Are there any user-facing changes?
A new array format. It should have no impact for users that don't use it.