Skip to content

Conversation

hmelder
Copy link
Member

@hmelder hmelder commented Sep 4, 2025

-[NSNumber hash] currently converts the number to a double value and casts it to an NSUInteger. The problem with this approach is that fractional numbers with the same integral part, but different fractional part are treated as being equal.

I looked into using a proper, still non-cryptographic hash function, such as the ones proposed here (https://nullprogram.com/blog/2018/07/31/), but this is overkill for the hash method.

Apple's implementation uses the magic 0x9e3779b1 and does an integer multiplication with the numbers unsigned integer value. 0x9e3779b1 is the closest prime to 0x9e3779b9 which is the integral part of the Golden Ratio's fractional part. https://lkml.org/lkml/2016/4/29/838 features a great discussion to why this is not a great idea.

If there is a need (I still do not understand the purpose of -[NSNumber hash]) for a hash function with good scattering properties, I'd recommend we instead use SplitMix64.

@hmelder hmelder requested a review from rfm as a code owner September 4, 2025 10:40
Copy link
Contributor

@rfm rfm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if not optimal, this looks like a clear improvement.

My understanding of the purpose of the hash method is that it's primarily to support the use of the object as a dictionary/map key, which is why a best performance is if the has values are evenly scattered.

In my experience, we use NSString objects as dictionary keys probably a hundred times more often than we use NSNumber objects, so if we want to optimise hash functions, NSString is a better candidate, but having an efficientd hash for NSNumber is still a good thing.

@hmelder
Copy link
Member Author

hmelder commented Sep 7, 2025

My understanding of the purpose of the hash method is that it's primarily to support the use of the object as a dictionary/map key, which is why a best performance is if the has values are evenly scattered.

If the result of the hash method is really used as a key to a dictionary, then this is probably not the right implementation. I agree that this is better than the old one, at least it is now not broken anymore, but it still does not fullfill the basic properties of a non-cryptographic hash function.

@hmelder
Copy link
Member Author

hmelder commented Sep 7, 2025

Are you sure that the result of -hash is directly used? If that is the case, then using NSNumber objects as keys into a dictionary is completely broken.

Copy link
Contributor

@rfm rfm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, on re-reading this, I no longer think this is an improvement on the original version.
Originally I was simply thinking about the ability of the returned hash to be used to evenly distribute keys among hash buckets to provide optimal performance of hash table (dictionary/map) lookups: a dictionary consists of an array of 'buckets' (efficiently selected by a hash % number-of-buckets calculation) with each bucket containing a linked list of nodes ... an even hash distribution minimises the length of the linked lists, so performance drops if we have an uneven hash.
However, that's not the only consideration ... if we hash two keys with the same value and get two different different hash values then those two keys can select different buckets so that a value stored in the dictionary with one key cannot be retrieved using the other. My concern is that the latest code uses the raw data of float and double numbers rather than the actual numeric value of the key, and I suspect that means that if we have two keys with the same numeric value they will give different hashes depending on whether they are stored as int/float/double. That really breaks the -hash method, by making the map/dictionary fail (rather than just being inefficient/slow).

@hmelder
Copy link
Member Author

hmelder commented Sep 8, 2025

My concern is that the latest code uses the raw data of float and double numbers rather than the actual numeric value of the key, and I suspect that means that if we have two keys with the same numeric value they will give different hashes depending on whether they are stored as int/float/double

Are you concerned about floating point values that are within machine epsilon distance and thus needs to be treated as being equivalent?

@rfm
Copy link
Contributor

rfm commented Sep 8, 2025

I'm not really familiar with the underlying representations, but my worry is that the binary representations of the same number (say 42) differ depending on the type of the number.
So if 42 is stored as a double, the statement 'return c.u;' would return a different value from if 42 is stored as a double, and both would be different to 42 stored as an int.
Any hash code for NSNumber ideally should ensure that it returns the same hash for the same numeric value, irrespective of the binary representation in use to store that numeric value.
Now, we could argue that if you store values using different types then they should/do have different hashes, but I think that breaks expectations: If two numbers are equal then their hashes must be the same.
[[NSNumber numberWithFloat: 42] hash] == [[NSNumber numberWithInt: 42] hash]
This should apply for all cases where there is no loss of information converting between types (clearly not all numbers can be represented in all numeric types).

@rfm
Copy link
Contributor

rfm commented Sep 8, 2025

In short, for two NSNumbers A and B, the implementation needs to be such that
[A isEqual: B] => [A hash] == [B hash]

@hmelder
Copy link
Member Author

hmelder commented Sep 8, 2025

I will investigate the effect on scattering when integer and fractional part are hashed seperately and then added togetter. This avoids working with the raw floating point represenation.

@rfm
Copy link
Contributor

rfm commented Sep 8, 2025

That sounds like a good approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants