You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bytes.hashCode() is implemented in a sub-optimal way:
@OverridepublicinthashCode() {
inth = 1;
for (longi = length() - 1; i >= 0; i--) {
h = 31 * h + getByte(i);
}
returnh;
}
Instead, we can use Arrays.hashCode(), which leverages vector ops support in modern CPUs.
There is one issue, though. The existing hashCode() may not be changed, not yet. If it's changed, hash code values will be changed, too. If a hash code is persisted anywhere (e.g. in MerkleDb), this place will be broken. We've seen it in the past: a key was stored on disk with one hash code, then hashCode() implementation was changed, and the key couldn't be retrieved from disk any longer.
So I propose:
Introduce a separate method in Bytes, for example, fastHashCode(). This is what this ticket is about
When we switch to Virtual Mega Map, use this method instead of hashCode() to store keys/values on disk. Migration to the mega map doesn't have to be backwards compatible, all data is migrated anyway
Some time later, hashCode() can be changed the same way as fastHashCode(), and the latter can be removed. All usages of fastHashCode() is the main repo will be replaced with hashCode(). Since implementation is the same, it will be backwards compatible
In-memory usages of hashCode(), e.g. when an object is stored in a Java collection, should be fine, since they are not persisted across node runs (no traces in state snapshots)
The text was updated successfully, but these errors were encountered:
Arrays.hashCode() only works with full arrays, there is no way to hash a part of an array (offset + length). It makes this method unusable for sliced Bytes objects (created using Bytes.slice())
Arrays.hashCode() value is not equal to existing Bytes.hashCode()
Arrays.hashCode() calls ArraysSupport.vectorizedHashCode(), which is annotated with IntrinsicCandidate. It means, this method may or may not be optimized by VM
Optimized and unoptimized versions of ArraysSupport.IntrinsicCandidate() produce different values. We need stable and predictable hashCode() implementation for Bytes, so Arrays.hashCode() doesn't look like a viable option
And some numbers:
On mac, Arrays.hashCode() is 50-60% faster than existing Bytes.hashCode()
I suspect Arrays.hashCode() is not optimized by JVM on mac. With some tweaks to our current implementation, it can be made almost as fast as Arrays.hashCode()
On linux, Arrays.hashCode() is 3-4 times faster than the current implementation
Unoptimized version of Arrays.hashCode() is about 25% faster than the current implementation, but with the same tweaks the difference can be eliminated
So I am going to repurpose this ticket to improve existing Bytes.hashCode() rather than to provide a new "fast" version of this method. These improvements will produce same values as hashCode() does today, i.e. new implementation will be fully backwards compatible. It will also be predictable, so we don't have to assume JVM will or will not run some optimizations (which result in different hash code values), hash codes will be always calculated the very same way.
artemananiev
changed the title
Bytes.fastHashCode()
Bytes.hashCode() can be optimized
Feb 12, 2025
Bytes.hashCode()
is implemented in a sub-optimal way:Instead, we can use
Arrays.hashCode()
, which leverages vector ops support in modern CPUs.There is one issue, though. The existing
hashCode()
may not be changed, not yet. If it's changed, hash code values will be changed, too. If a hash code is persisted anywhere (e.g. in MerkleDb), this place will be broken. We've seen it in the past: a key was stored on disk with one hash code, then hashCode() implementation was changed, and the key couldn't be retrieved from disk any longer.So I propose:
Bytes
, for example,fastHashCode()
. This is what this ticket is abouthashCode()
to store keys/values on disk. Migration to the mega map doesn't have to be backwards compatible, all data is migrated anywayhashCode()
can be changed the same way asfastHashCode()
, and the latter can be removed. All usages offastHashCode()
is the main repo will be replaced withhashCode()
. Since implementation is the same, it will be backwards compatiblehashCode()
, e.g. when an object is stored in a Java collection, should be fine, since they are not persisted across node runs (no traces in state snapshots)The text was updated successfully, but these errors were encountered: