A Bloom Filter is a probabilistic data structure that is used to test whether an element is a member of a set. It is highly space-efficient but allows for a small probability of false positives.
- Initialization: A Bloom Filter is initialized with an array of bits set to 0.
- Hash Functions: Multiple independent hash functions are used to hash each element.
- Insertion: To add an element, it is hashed using all hash functions, and the corresponding bits in the array are set to 1.
- Query: To check if an element is in the set, it is hashed using all hash functions. If all corresponding bits are 1, the element is probably in the set; if any bit is 0, the element is definitely not in the set.
- Space Efficient: Requires less memory compared to other data structures.
- Speed: Fast insertion and query operations.
- False Positives: May return false positives, meaning it might indicate that an element is in the set when it is not.
- No Deletion: Standard Bloom Filters do not support deletion of elements.
- Databases: To quickly check if an element is present in a large dataset.
- Networking: To prevent cache poisoning in distributed systems.
- Security: To detect malicious URLs or spam.
from bloom_filter import BloomFilter
# Initialize Bloom Filter with expected number of elements and false positive rate
bloom = BloomFilter(max_elements=1000, error_rate=0.1)
# Add elements
bloom.add("apple")
bloom.add("banana")
# Query elements
print("apple" in bloom) # Output: True
print("cherry" in bloom) # Output: False