Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persistent_dict: add complex hashing, numpy scalar hashing #184

Merged
merged 13 commits into from
Oct 11, 2023

Conversation

matthiasdiener
Copy link
Contributor

@matthiasdiener matthiasdiener commented Sep 25, 2023

Please squash

@matthiasdiener
Copy link
Contributor Author

This is ready for review.

@@ -301,6 +315,10 @@ def update_for_int(key_hash, key):
return
except OverflowError:
sz *= 2
except AttributeError:
# Numpy scalars don't have to_bytes()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better approach would be to have all numpy scalars go to the same method, in which you convert to shape-() array and then use to_bytes on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add similar functionality for ndarrays as well? maybe something like (tobytes, shape, dtype)?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, definitely not!

  • They're mutable.
  • With that implementation, it would violate the fundamental property a hash satisfies, i.e. a == b imples hash(a) == hash(b). That's because the array's byte pattern depends on storage order (e.g. F/C ordering), for arrays that Numpy would otherwise consider equal.

Copy link
Contributor Author

@matthiasdiener matthiasdiener Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that tobytes() for complex256 (only that dtype) isn't stable:

import numpy as np

for i in range(10):
    k = np.complex256(1.1+2.2j)
    print(k.tobytes())
$ python t.py
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x00\x00\x00\x00\x00\x00\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@\x00\x00\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x00\x00\x00\x00\x00\x00\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@\x00\x00\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x00\x00\x00\x00\x00\x00\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@\x00\x00\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x00\x00\x00\x00\x00\x00\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@\x00\x00\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x00\x00\x00\x00\x00\x00\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@\x00\x00\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x00\x00\x00\x00\x00\x00\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@\x00\x00\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x00\x00\x00\x00\x00\x00\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@\x00\x00\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x00\x00\x00\x00\x00\x00\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@\x00\x00\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x86\xc5\xeel\x16\xfe\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@`w\x00\x00\x00\x00'
b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\xff?\x86\xc5\xeel\x16\xfe\x00\xd0\xcc\xcc\xcc\xcc\xcc\x8c\x00@`w\x00\x00\x00\x00'

(same issue with an array of complex256)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes a little bit of sense to me, as (at least on x86), float128 (where complex256 is just two of those) is just the x87 (not a typo) 80-bit FP type. So there are six bytes that aren't accounted for and possibly arbitrary.

Maybe specially handle those? (Cast to float (yes, Python float, both parts for complex), call repr and hash that)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes a little bit of sense to me, as (at least on x86), float128 (where complex256 is just two of those) is just the x87 (not a typo) 80-bit FP type. So there are six bytes that aren't accounted for and possibly arbitrary.

Oh 😞 In other words, this might also affect float128 itself?

Maybe specially handle those? (Cast to float (yes, Python float, both parts for complex), call repr and hash that)?

If you prefer that idea to f1b766c, I can implement that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented your suggestion in aeb824a (using Python's complex() for converting np.complex256 because that seemed a bit simper).

pytools/persistent_dict.py Outdated Show resolved Hide resolved
@inducer inducer merged commit 47dfe4f into inducer:main Oct 11, 2023
12 checks passed
@inducer
Copy link
Owner

inducer commented Oct 11, 2023

Thanks!

@matthiasdiener matthiasdiener deleted the persistent-dict-scalar-hashing branch October 11, 2023 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants