Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-44393: [C++][Compute] Swizzle vector functions #44394

Open
wants to merge 58 commits into
base: main
Choose a base branch
from

Conversation

zanmato1984
Copy link
Contributor

@zanmato1984 zanmato1984 commented Oct 13, 2024

Rationale for this change

For background please see #44393.

When implementing the "scatter" function requested in #44393, I found it also useful to make it a public vector API. After a painful thinking, I decided to name it "permute". And when implementing permute, I found it fairly easy to implement it by first computing the "reverse indices" of the positions, and then invoking the existing "take", where I think "reverse_indices" itself can also be a useful public vector API. Thus the PR categorized them as "placement functions".

What changes are included in this PR?

Implement placement API reverse_indices and permute, where permute(values, indices) is implemented as take(values, reverse_indices(indices)).

I also put a small UT Permute.IfElse demonstrating how permute can serve as a building block of implementing special forms.

Are these changes tested?

UT included.

Are there any user-facing changes?

Yes, new public APIs added. Documents updated.

@zanmato1984 zanmato1984 requested a review from felipecrv October 13, 2024 16:33
Copy link

⚠️ GitHub issue #44393 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 4, 2024
@zanmato1984
Copy link
Contributor Author

Hi @felipecrv , the renaming is done. Would you like to proceed with the review? Thank you.

@zanmato1984 zanmato1984 changed the title GH-44393: [C++][Compute] Placement vector functions GH-44393: [C++][Compute] Swizzle vector functions Nov 7, 2024
@zanmato1984
Copy link
Contributor Author

Hi @pitrou @felipecrv @mapleFU , shall we move on with this? I'll need this for my new (and possibly formal) implementation of special form.

Also cc @bkietz .

Thanks.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zanmato1984 and sorry for the delay in reviewing. You'll find some comments below.

Also, can you update this PR with recent git main and fix any conflicts?

docs/source/cpp/compute.rst Outdated Show resolved Hide resolved
docs/source/cpp/compute.rst Outdated Show resolved Hide resolved
cpp/src/arrow/compute/api_vector.h Outdated Show resolved Hide resolved
cpp/src/arrow/compute/api_vector.h Outdated Show resolved Hide resolved
/// \brief The max value in the input indices to process. Any values with indices that
/// are greater than this value will be ignored. If negative, this value will be set to
/// the length of the input minus 1.
int64_t max_index = -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it clearer to name it something like output_length? (you would have to offset its definition by 1)
@felipecrv

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it was exactly output_length until I modified it according to @felipecrv 's suggestion. See #44394 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer output_length as it's more straightforward and saves a little implementation code actually. On the other hand, max_index contains more "semantic" though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahah. Then perhaps we can just update the doc to say that the output length will be max_index + 1 if this option is set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Is updating here 53f3c33 (#44394) enough?

cpp/src/arrow/compute/kernels/vector_swizzle.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_swizzle.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_swizzle.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_swizzle.cc Outdated Show resolved Hide resolved
cpp/src/arrow/compute/kernels/vector_swizzle.cc Outdated Show resolved Hide resolved
@zanmato1984
Copy link
Contributor Author

Hi @pitrou , you will see my commits and comments one by one, this is me addressing each of your comments. I'll explicitly let you know once they are all done. Before that, you can just ignore the notification.

Thank you for the review, appreciate that!

@zanmato1984
Copy link
Contributor Author

Hi @pitrou , I think I've addressed all the comments before. Would you take a second look? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants