Bytes.hashCode() can be optimized #389

artemananiev · 2025-02-07T18:49:09Z

Bytes.hashCode() is implemented in a sub-optimal way:

    @Override
    public int hashCode() {
        int h = 1;
        for (long i = length() - 1; i >= 0; i--) {
            h = 31 * h + getByte(i);
        }
        return h;
    }

Instead, we can use Arrays.hashCode(), which leverages vector ops support in modern CPUs.

There is one issue, though. The existing hashCode() may not be changed, not yet. If it's changed, hash code values will be changed, too. If a hash code is persisted anywhere (e.g. in MerkleDb), this place will be broken. We've seen it in the past: a key was stored on disk with one hash code, then hashCode() implementation was changed, and the key couldn't be retrieved from disk any longer.

So I propose:

Introduce a separate method in Bytes, for example, fastHashCode(). This is what this ticket is about
When we switch to Virtual Mega Map, use this method instead of hashCode() to store keys/values on disk. Migration to the mega map doesn't have to be backwards compatible, all data is migrated anyway
Some time later, hashCode() can be changed the same way as fastHashCode(), and the latter can be removed. All usages of fastHashCode() is the main repo will be replaced with hashCode(). Since implementation is the same, it will be backwards compatible
In-memory usages of hashCode(), e.g. when an object is stored in a Java collection, should be fine, since they are not persisted across node runs (no traces in state snapshots)

The text was updated successfully, but these errors were encountered:

artemananiev · 2025-02-12T00:42:12Z

Some interesting facts:

Arrays.hashCode() only works with full arrays, there is no way to hash a part of an array (offset + length). It makes this method unusable for sliced Bytes objects (created using Bytes.slice())
Arrays.hashCode() value is not equal to existing Bytes.hashCode()
Arrays.hashCode() calls ArraysSupport.vectorizedHashCode(), which is annotated with IntrinsicCandidate. It means, this method may or may not be optimized by VM
Optimized and unoptimized versions of ArraysSupport.IntrinsicCandidate() produce different values. We need stable and predictable hashCode() implementation for Bytes, so Arrays.hashCode() doesn't look like a viable option

And some numbers:

On mac, Arrays.hashCode() is 50-60% faster than existing Bytes.hashCode()
I suspect Arrays.hashCode() is not optimized by JVM on mac. With some tweaks to our current implementation, it can be made almost as fast as Arrays.hashCode()
On linux, Arrays.hashCode() is 3-4 times faster than the current implementation
Unoptimized version of Arrays.hashCode() is about 25% faster than the current implementation, but with the same tweaks the difference can be eliminated

So I am going to repurpose this ticket to improve existing Bytes.hashCode() rather than to provide a new "fast" version of this method. These improvements will produce same values as hashCode() does today, i.e. new implementation will be fully backwards compatible. It will also be predictable, so we don't have to assume JVM will or will not run some optimizations (which result in different hash code values), hash codes will be always calculated the very same way.

artemananiev added the Performance Issues related to performance concerns. label Feb 7, 2025

artemananiev self-assigned this Feb 7, 2025

artemananiev mentioned this issue Feb 7, 2025

Virtual Mega Map hiero-ledger/hiero-consensus-node#14395

Open

artemananiev changed the title ~~Bytes.fastHashCode()~~ Bytes.hashCode() can be optimized Feb 12, 2025

artemananiev mentioned this issue Feb 13, 2025

fix: 389: Bytes.hashCode() can be optimized #391

Merged

artemananiev closed this as completed in #391 Feb 13, 2025

artemananiev closed this as completed in 381ac5b Feb 13, 2025

artemananiev added this to the 0.9.17 milestone Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bytes.hashCode() can be optimized #389

Bytes.hashCode() can be optimized #389

artemananiev commented Feb 7, 2025

artemananiev commented Feb 12, 2025

Bytes.hashCode() can be optimized #389

Bytes.hashCode() can be optimized #389

Comments

artemananiev commented Feb 7, 2025

artemananiev commented Feb 12, 2025