-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-37378: [C++] Add A Dictionary Compaction Function For DictionaryArray #37418
Conversation
|
cpp/src/arrow/array/array_dict.cc
Outdated
dict_used[current_index] = true; | ||
dict_used_count++; | ||
|
||
if (dict_used_count == dict_length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking here enables skipping the rest of the dictionary, which is good. However I think it'd also be useful to detect usage of only a slice of the dictionary. If you'd prefer not to handle that in this PR, please write a follow up issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @bkietz, do you mean if we find it just use only a slice of dictionay, we use slice()
instead of Take
? I prefer to leave it as an new issue. Since we have another PR which is wating for this PR to be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just use Slice
outside the Compact
to handling this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I write a follow up issue #38247 to handle the optimization
Co-authored-by: Benjamin Kietzman <[email protected]>
Co-authored-by: Benjamin Kietzman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM, thanks!
|
||
type = boolean(); | ||
dict_type = dictionary(index_type, type); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a test for invalid input type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. We can't create a DictionaryArray with an invalid index type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see a
if (data->type->id() != Type::DICTIONARY) {
return Status::TypeError("Expected dictionary type");
}
Here, but seems it cannot be called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for working on this!
Strange. All C++ tests are canceled in CI. |
CI failures look unrelated. Thanks! |
…naryArray (apache#37418) ### Rationale for this change A Dictionary Compaction Function for DictionaryArray is supported. ### What changes are included in this PR? Add a Function for Dictionary Compaction ### Are these changes tested? Yes Are there any user-facing changes? No * Closes: apache#37378 Lead-authored-by: Junming Chen <[email protected]> Co-authored-by: Ben Harkins <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 73454b7. There were 4 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…naryArray (apache#37418) ### Rationale for this change A Dictionary Compaction Function for DictionaryArray is supported. ### What changes are included in this PR? Add a Function for Dictionary Compaction ### Are these changes tested? Yes Are there any user-facing changes? No * Closes: apache#37378 Lead-authored-by: Junming Chen <[email protected]> Co-authored-by: Ben Harkins <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
…naryArray (apache#37418) ### Rationale for this change A Dictionary Compaction Function for DictionaryArray is supported. ### What changes are included in this PR? Add a Function for Dictionary Compaction ### Are these changes tested? Yes Are there any user-facing changes? No * Closes: apache#37378 Lead-authored-by: Junming Chen <[email protected]> Co-authored-by: Ben Harkins <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
…naryArray (apache#37418) ### Rationale for this change A Dictionary Compaction Function for DictionaryArray is supported. ### What changes are included in this PR? Add a Function for Dictionary Compaction ### Are these changes tested? Yes Are there any user-facing changes? No * Closes: apache#37378 Lead-authored-by: Junming Chen <[email protected]> Co-authored-by: Ben Harkins <[email protected]> Co-authored-by: Benjamin Kietzman <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
Rationale for this change
A Dictionary Compaction Function for DictionaryArray is supported.
What changes are included in this PR?
Add a Function for Dictionary Compaction
Are these changes tested?
Yes
Are there any user-facing changes?
No