nim-bloom

NOTE: THIS IMPLEMENTATION IS NOT PEER-REVIEWED YET. PLEASE USE WITH CAUTION.

A high-performance Bloom filter implementation in Nim offering standard and custom hash function options with different performance characteristics and false positive rates.

Features

Fast string element insertion and lookup
Configurable error rates
Choice between standard Nim hash and custom MurmurHash3 (128-bit or 32-bit)
Optimized for supporting different use cases of speed and accuracy
Comprehensive test suite and benchmarks

Usage

Basic usage (defaults to MurmurHash3_128):

import bloom2

# Initialize with default hash (MurmurHash3_128)
var bf = initializeBloomFilter(capacity = 10000, errorRate = 0.01)

# Or explicitly specify hash type
var bf32 = initializeBloomFilter(
  capacity = 10000, 
  errorRate = 0.01,
  hashType = htMurmur32  # Use 32-bit implementation
)

# Basic operations
bf.insert("test")
assert bf.lookup("test")

Hash Function Selection

Use MurmurHash3_128 (default) when:
- You need the best balance of performance and accuracy
- Memory isn't severely constrained
- Working with large datasets
- False positive rates are important
Use MurmurHash3_32 when:
- Running on 32-bit systems
- Memory is constrained
- Working with smaller datasets
- String concatenation overhead for second hash, causing higher insertion and lookup times, is acceptable.
Use NimHash when:
- Consistency with Nim's hashing is important
- Working with smaller datasets where performance is less critical
- Future availability of better hash functions or performant implementations

Nim's Hash Implementation:

Default (no flags): Uses FarmHash implementation
With -d:nimStringHash2: Uses Nim's MurmurHash3_32 implementation
Our implementation allows explicit choice regardless of compilation flags and our MurmurHash3_32 performs better because of directly using a native C Implementation

Performance Characteristics

For 1M items - Random Strings

Insertion Speed:
MurmurHash3_128: ~6.8M ops/sec
MurmurHash3_32:  ~5.9M ops/sec
FarmHash:        ~2.1M ops/sec

False Positive Rates:
MurmurHash3_128: ~0.84%
MurmurHash3_32:  ~0.83%
FarmHash:        ~0.82%

These measurements show MurmurHash3_128's balanced performance profile, offering best speed and competitive false positive rates.

Performance will vary based on:

Choice of hash function
Hardware specifications
Data size and memory access patterns (inside vs outside cache)
Compiler optimizations

For detailed benchmarks across different data patterns and sizes, see benches.

Implementation Details

Double Hashing Technique

This implmentation uses the Kirsch-Mitzenmacher method to generate k hash values from two initial hashes. The implementation varies by hash type:

MurmurHash3_128:

h(i) = abs((hash1 + i * hash2) mod m)

Uses both 64-bit hashes from 128-bit output
Natural double-hash implementation

MurmurHash3_32:

let baseHash = murmurHash32(item, 0'u32)
let secondHash = murmurHash32(item & " b", 0'u32)

Uses string concatention by default for the second hash
Bit Rotation for second hash provides sufficient randomness in some use cases while being much faster than string concatenation (but results in higher FP rate)
Choose between bit rotation or string concatenation as per your use-case.

Nim's Hash:

  let
    hashA = abs(hash(item)) mod maxValue
    hashB = abs(hash(item & " b")) mod maxValue
  h(i) = abs((hashA + n * hashB)) mod maxValue

Farm Hash or Nim's Murmur Hash based (if compliation flag is passed)
Uses string concatention by default.
Lower FP rate than bit rotation but comes at the cost of higher insertion and lookup times.

Tip: Bit rotation values can be configurable as well. Use prime numbers for better mixing: 7, 11, 13, 17 for 32-bit; 21, 23, 27, 33 for 64-bit. Smaller rotations provides lesser mixing but as faster than higher rotations.

Testing

Run the test suite:

nimble test

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
benches		benches
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bloom.nimble		bloom.nimble

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nim-bloom

Features

Usage

Hash Function Selection

Performance Characteristics

For 1M items - Random Strings

Implementation Details

Double Hashing Technique

Testing

About

Releases

Packages

Languages

License

waku-org/nim-bloom-filter

Folders and files

Latest commit

History

Repository files navigation

nim-bloom

Features

Usage

Hash Function Selection

Performance Characteristics

For 1M items - Random Strings

Implementation Details

Double Hashing Technique

Testing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages