Bloom Filter

A Bloom Filter is a probabilistic data structure that is used to test whether an element is a member of a set. It is highly space-efficient but allows for a small probability of false positives.

How It Works

Initialization: A Bloom Filter is initialized with an array of bits set to 0.
Hash Functions: Multiple independent hash functions are used to hash each element.
Insertion: To add an element, it is hashed using all hash functions, and the corresponding bits in the array are set to 1.
Query: To check if an element is in the set, it is hashed using all hash functions. If all corresponding bits are 1, the element is probably in the set; if any bit is 0, the element is definitely not in the set.

Advantages

Space Efficient: Requires less memory compared to other data structures.
Speed: Fast insertion and query operations.

Disadvantages

False Positives: May return false positives, meaning it might indicate that an element is in the set when it is not.
No Deletion: Standard Bloom Filters do not support deletion of elements.

Use Cases

Databases: To quickly check if an element is present in a large dataset.
Networking: To prevent cache poisoning in distributed systems.
Security: To detect malicious URLs or spam.

Example

from bloom_filter import BloomFilter

# Initialize Bloom Filter with expected number of elements and false positive rate
bloom = BloomFilter(max_elements=1000, error_rate=0.1)

# Add elements
bloom.add("apple")
bloom.add("banana")

# Query elements
print("apple" in bloom)  # Output: True
print("cherry" in bloom) # Output: False

References

Wikipedia: Bloom Filter
Bloom Filter Python Library

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Bloom Filter

How It Works

Advantages

Disadvantages

Use Cases

Example

References

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Bloom Filter

How It Works

Advantages

Disadvantages

Use Cases

Example

References