Skip to content

rsarosh/BloomFilter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Bloom Filter

A Bloom Filter is a probabilistic data structure that is used to test whether an element is a member of a set. It is highly space-efficient but allows for a small probability of false positives.

How It Works

  1. Initialization: A Bloom Filter is initialized with an array of bits set to 0.
  2. Hash Functions: Multiple independent hash functions are used to hash each element.
  3. Insertion: To add an element, it is hashed using all hash functions, and the corresponding bits in the array are set to 1.
  4. Query: To check if an element is in the set, it is hashed using all hash functions. If all corresponding bits are 1, the element is probably in the set; if any bit is 0, the element is definitely not in the set.

Advantages

  • Space Efficient: Requires less memory compared to other data structures.
  • Speed: Fast insertion and query operations.

Disadvantages

  • False Positives: May return false positives, meaning it might indicate that an element is in the set when it is not.
  • No Deletion: Standard Bloom Filters do not support deletion of elements.

Use Cases

  • Databases: To quickly check if an element is present in a large dataset.
  • Networking: To prevent cache poisoning in distributed systems.
  • Security: To detect malicious URLs or spam.

Example

from bloom_filter import BloomFilter

# Initialize Bloom Filter with expected number of elements and false positive rate
bloom = BloomFilter(max_elements=1000, error_rate=0.1)

# Add elements
bloom.add("apple")
bloom.add("banana")

# Query elements
print("apple" in bloom)  # Output: True
print("cherry" in bloom) # Output: False

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages