Skip to content

An enhanced bloom filter implementation in nim with configurable hash

License

Notifications You must be signed in to change notification settings

waku-org/nim-bloom-filter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nim-bloom

NOTE: THIS IMPLEMENTATION IS NOT PEER-REVIEWED YET. PLEASE USE WITH CAUTION.

A high-performance Bloom filter implementation in Nim offering standard and custom hash function options with different performance characteristics and false positive rates.

Features

  • Fast string element insertion and lookup
  • Configurable error rates
  • Choice between standard Nim hash and custom MurmurHash3 (128-bit or 32-bit)
  • Optimized for supporting different use cases of speed and accuracy
  • Comprehensive test suite and benchmarks

Usage

Basic usage (defaults to MurmurHash3_128):

import bloom2

# Initialize with default hash (MurmurHash3_128)
var bf = initializeBloomFilter(capacity = 10000, errorRate = 0.01)

# Or explicitly specify hash type
var bf32 = initializeBloomFilter(
  capacity = 10000, 
  errorRate = 0.01,
  hashType = htMurmur32  # Use 32-bit implementation
)

# Basic operations
bf.insert("test")
assert bf.lookup("test")

Hash Function Selection

  1. Use MurmurHash3_128 (default) when:

    • You need the best balance of performance and accuracy
    • Memory isn't severely constrained
    • Working with large datasets
    • False positive rates are important
  2. Use MurmurHash3_32 when:

    • Running on 32-bit systems
    • Memory is constrained
    • Working with smaller datasets
    • String concatenation overhead for second hash, causing higher insertion and lookup times, is acceptable.
  3. Use NimHash when:

    • Consistency with Nim's hashing is important
    • Working with smaller datasets where performance is less critical
    • Future availability of better hash functions or performant implementations

Nim's Hash Implementation:

  • Default (no flags): Uses FarmHash implementation
  • With -d:nimStringHash2: Uses Nim's MurmurHash3_32 implementation
  • Our implementation allows explicit choice regardless of compilation flags and our MurmurHash3_32 performs better because of directly using a native C Implementation

Performance Characteristics

For 1M items - Random Strings

Insertion Speed:
MurmurHash3_128: ~6.8M ops/sec
MurmurHash3_32:  ~5.9M ops/sec
FarmHash:        ~2.1M ops/sec

False Positive Rates:
MurmurHash3_128: ~0.84%
MurmurHash3_32:  ~0.83%
FarmHash:        ~0.82%

These measurements show MurmurHash3_128's balanced performance profile, offering best speed and competitive false positive rates.

Performance will vary based on:

  • Choice of hash function
  • Hardware specifications
  • Data size and memory access patterns (inside vs outside cache)
  • Compiler optimizations

For detailed benchmarks across different data patterns and sizes, see benches.

Implementation Details

Double Hashing Technique

This implmentation uses the Kirsch-Mitzenmacher method to generate k hash values from two initial hashes. The implementation varies by hash type:

  1. MurmurHash3_128:
h(i) = abs((hash1 + i * hash2) mod m)
  • Uses both 64-bit hashes from 128-bit output
  • Natural double-hash implementation
  1. MurmurHash3_32:
let baseHash = murmurHash32(item, 0'u32)
let secondHash = murmurHash32(item & " b", 0'u32)
  • Uses string concatention by default for the second hash
  • Bit Rotation for second hash provides sufficient randomness in some use cases while being much faster than string concatenation (but results in higher FP rate)
  • Choose between bit rotation or string concatenation as per your use-case.
  1. Nim's Hash:
  let
    hashA = abs(hash(item)) mod maxValue
    hashB = abs(hash(item & " b")) mod maxValue
  h(i) = abs((hashA + n * hashB)) mod maxValue
  • Farm Hash or Nim's Murmur Hash based (if compliation flag is passed)
  • Uses string concatention by default.
  • Lower FP rate than bit rotation but comes at the cost of higher insertion and lookup times.

Tip: Bit rotation values can be configurable as well. Use prime numbers for better mixing: 7, 11, 13, 17 for 32-bit; 21, 23, 27, 33 for 64-bit. Smaller rotations provides lesser mixing but as faster than higher rotations.

Testing

Run the test suite:

nimble test

About

An enhanced bloom filter implementation in nim with configurable hash

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published