Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37378: [C++] Add A Dictionary Compaction Function For DictionaryArray #37418

Merged
merged 61 commits into from
Oct 11, 2023

Conversation

R-JunmingChen
Copy link
Contributor

@R-JunmingChen R-JunmingChen commented Aug 28, 2023

Rationale for this change

A Dictionary Compaction Function for DictionaryArray is supported.

What changes are included in this PR?

Add a Function for Dictionary Compaction

Are these changes tested?

Yes

Are there any user-facing changes?
No

@github-actions
Copy link

⚠️ GitHub issue #37378 has been automatically assigned in GitHub to PR creator.

@R-JunmingChen R-JunmingChen marked this pull request as ready for review August 31, 2023 15:49
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Sep 5, 2023
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Oct 10, 2023
@R-JunmingChen R-JunmingChen requested a review from bkietz October 10, 2023 03:26
cpp/src/arrow/array/array_dict.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/array_dict.cc Outdated Show resolved Hide resolved
dict_used[current_index] = true;
dict_used_count++;

if (dict_used_count == dict_length) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking here enables skipping the rest of the dictionary, which is good. However I think it'd also be useful to detect usage of only a slice of the dictionary. If you'd prefer not to handle that in this PR, please write a follow up issue

Copy link
Contributor Author

@R-JunmingChen R-JunmingChen Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @bkietz, do you mean if we find it just use only a slice of dictionay, we use slice() instead of Take? I prefer to leave it as an new issue. Since we have another PR which is wating for this PR to be merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use Slice outside the Compact to handling this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I write a follow up issue #38247 to handle the optimization

cpp/src/arrow/array/array_dict.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_dictionary.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Oct 10, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 11, 2023
@R-JunmingChen R-JunmingChen requested a review from bkietz October 11, 2023 08:06
Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM, thanks!


type = boolean();
dict_type = dictionary(index_type, type);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test for invalid input type?

Copy link
Contributor Author

@R-JunmingChen R-JunmingChen Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. We can't create a DictionaryArray with an invalid index type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see a

  if (data->type->id() != Type::DICTIONARY) {
    return Status::TypeError("Expected dictionary type");
  }

Here, but seems it cannot be called

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working on this!

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Oct 11, 2023
@R-JunmingChen
Copy link
Contributor Author

Strange. All C++ tests are canceled in CI.

@bkietz
Copy link
Member

bkietz commented Oct 11, 2023

CI failures look unrelated. Thanks!

@bkietz bkietz merged commit 73454b7 into apache:main Oct 11, 2023
@bkietz bkietz removed the awaiting merge Awaiting merge label Oct 11, 2023
@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Oct 12, 2023
llama90 pushed a commit to llama90/arrow that referenced this pull request Oct 12, 2023
…naryArray (apache#37418)

### Rationale for this change
A Dictionary Compaction Function for DictionaryArray is supported.

### What changes are included in this PR?
Add a Function for Dictionary Compaction

### Are these changes tested?
Yes

Are there any user-facing changes?
No
* Closes: apache#37378

Lead-authored-by: Junming Chen <[email protected]>
Co-authored-by: Ben Harkins <[email protected]>
Co-authored-by: Benjamin Kietzman <[email protected]>
Signed-off-by: Benjamin Kietzman <[email protected]>
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 73454b7.

There were 4 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them.

JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 23, 2023
…naryArray (apache#37418)

### Rationale for this change
A Dictionary Compaction Function for DictionaryArray is supported.

### What changes are included in this PR?
Add a Function for Dictionary Compaction

### Are these changes tested?
Yes

Are there any user-facing changes?
No
* Closes: apache#37378

Lead-authored-by: Junming Chen <[email protected]>
Co-authored-by: Ben Harkins <[email protected]>
Co-authored-by: Benjamin Kietzman <[email protected]>
Signed-off-by: Benjamin Kietzman <[email protected]>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…naryArray (apache#37418)

### Rationale for this change
A Dictionary Compaction Function for DictionaryArray is supported.

### What changes are included in this PR?
Add a Function for Dictionary Compaction

### Are these changes tested?
Yes

Are there any user-facing changes?
No
* Closes: apache#37378

Lead-authored-by: Junming Chen <[email protected]>
Co-authored-by: Ben Harkins <[email protected]>
Co-authored-by: Benjamin Kietzman <[email protected]>
Signed-off-by: Benjamin Kietzman <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…naryArray (apache#37418)

### Rationale for this change
A Dictionary Compaction Function for DictionaryArray is supported.

### What changes are included in this PR?
Add a Function for Dictionary Compaction

### Are these changes tested?
Yes

Are there any user-facing changes?
No
* Closes: apache#37378

Lead-authored-by: Junming Chen <[email protected]>
Co-authored-by: Ben Harkins <[email protected]>
Co-authored-by: Benjamin Kietzman <[email protected]>
Signed-off-by: Benjamin Kietzman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Add a Dictionary Compaction Function For DictionaryArray
5 participants