Skip to content

OkBloomer, a novel autoscaling bloom filter with ultra-low memory footprint.

License

Notifications You must be signed in to change notification settings

andrewdalpino/PyBloomer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ok Bloomer

An implementation of the OkBloomer algorithm, an autoscaling Bloom filter with ultra-low memory footprint for Python. Ok Bloomer employs a novel layered filtering strategy that allows it to expand while maintaining an upper bound on the false positive rate. As such, Ok Bloomer is suitable for streaming data where the size is not known a priori.

  • Ultra-low memory footprint
  • Autoscaling works on streaming data
  • Bounded maximum false positive rate
  • Open-source and free to use commercially

Installation

Install DNA Hash using a Python package manager, example pip:

pip install okbloomer

Parameters

# Name Default Type Description
1 max_false_positive_rate 0.01 float The upper bound on the false positivity rate.
2 num_hashes 4 int The number of hash functions used, i.e. the number of slices per layer.
3 layer_size 32000000 int The size of each layer of the filter in bits. Ideal sizes can be divided evenly by num_hashes.

Example Usage

import okbloomer

filter = okbloomer.BloomFilter(
    max_false_positive_rate=0.01,
    num_hashes=4,
    layer_size=32000000,
)

filter.insert('foo')

print(filter.exists('foo'))

print(filter.existsOrInsert('bar'))

print(filter.exists('bar'))

print(filter.false_positive_rate())
True 

False

True

3.906249999999999e-27

References

  • [1] A. DalPino. (2021). OkBloomer, a novel autoscaling Bloom Filter [link].
  • [2] K. Christensen, et al. A New Analysis of the False-Positive Rate of a Bloom Filter.

About

OkBloomer, a novel autoscaling bloom filter with ultra-low memory footprint.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages