-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-17211: [C++] Add hash32
and hash64
scalar compute functions
#45001
base: main
Are you sure you want to change the base?
Conversation
Seems like we generate the same hash for both In [1]: import pyarrow as pa
In [2]: import pyarrow.compute as pc
In [3]: pc.hash_64([None])
Out[3]:
<pyarrow.lib.UInt64Array object at 0x124247be0>
[
0
]
In [4]: pc.hash_64([0])
Out[4]:
<pyarrow.lib.UInt64Array object at 0x1033027a0>
[
0
] |
hash_64
scalar compute function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some first glance comments. I'll look into more details later.
hash_64
scalar compute functionhash32
and hash64
scalar compute functions
Just asking - any idea if there is line of sight to this issue being closed? I am interested because, among other things, I believe this Arrow functionality is the underlying reason why pandas cannot currently group by a pyarrow-backed struct series. |
Rationale for this change
Support for calculating elementwise hashes.
The PR adds to scalar functions
hash32()
andhash64()
using the existing internal hashing machinery.What changes are included in this PR?
Continuation of #39836 with the following changes:
Are these changes tested?
Partially, working on a proper testing suite.
Are there any user-facing changes?
There is a new compute kernel
hash_64
available.