-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
persistent_dict: add complex hashing, numpy scalar hashing #184
persistent_dict: add complex hashing, numpy scalar hashing #184
Conversation
ffa92cf
to
ee1691c
Compare
This is ready for review. |
pytools/persistent_dict.py
Outdated
@@ -301,6 +315,10 @@ def update_for_int(key_hash, key): | |||
return | |||
except OverflowError: | |||
sz *= 2 | |||
except AttributeError: | |||
# Numpy scalars don't have to_bytes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better approach would be to have all numpy scalars go to the same method, in which you convert to shape-()
array and then use to_bytes
on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add similar functionality for ndarrays as well? maybe something like (tobytes, shape, dtype)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, definitely not!
- They're mutable.
- With that implementation, it would violate the fundamental property a hash satisfies, i.e.
a == b
impleshash(a) == hash(b)
. That's because the array's byte pattern depends on storage order (e.g. F/C ordering), for arrays that Numpy would otherwise consider equal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that tobytes()
for complex256 (only that dtype) isn't stable:
import numpy as np
for i in range(10):
k = np.complex256(1.1+2.2j)
print(k.tobytes())
$ python t.py
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x00\x00\x00\x00\x00\x00\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@\x00\x00\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x00\x00\x00\x00\x00\x00\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@\x00\x00\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x00\x00\x00\x00\x00\x00\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@\x00\x00\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x00\x00\x00\x00\x00\x00\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@\x00\x00\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x00\x00\x00\x00\x00\x00\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@\x00\x00\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x00\x00\x00\x00\x00\x00\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@\x00\x00\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x00\x00\x00\x00\x00\x00\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@\x00\x00\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x00\x00\x00\x00\x00\x00\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@\x00\x00\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x86\xc5\xeel\x16\xfe\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@`w\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x86\xc5\xeel\x16\xfe\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@`w\x00\x00\x00\x00'
(same issue with an array of complex256)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes a little bit of sense to me, as (at least on x86), float128
(where complex256
is just two of those) is just the x87 (not a typo) 80-bit FP type. So there are six bytes that aren't accounted for and possibly arbitrary.
Maybe specially handle those? (Cast to float
(yes, Python float
, both parts for complex), call repr
and hash that)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes a little bit of sense to me, as (at least on x86),
float128
(wherecomplex256
is just two of those) is just the x87 (not a typo) 80-bit FP type. So there are six bytes that aren't accounted for and possibly arbitrary.
Oh 😞 In other words, this might also affect float128
itself?
Maybe specially handle those? (Cast to
float
(yes, Pythonfloat
, both parts for complex), callrepr
and hash that)?
If you prefer that idea to f1b766c, I can implement that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented your suggestion in aeb824a (using Python's complex()
for converting np.complex256
because that seemed a bit simper).
Thanks! |
Please squash