实现布隆过滤器

什么是布隆过滤器

布隆过滤器是一直支持快速查询的数据结构。

如何实现布隆过滤器

首先是定义多个hash函数，可以自己实现，主要是为了让字符串可以分配到不同的bit上。

def hash_one(s):
  result = 0
  for char in s:
    result += ord(char) % 1024
  return result

def hash_two(s):
  result = 0
  for char in s:
    result += ord(char) ** 2 / 1024
  return result

print(hash_one("foo"))
# Print 324
print(hash_two("324"))
# Print 34

然后初始化bitmap，当我们加入“foo”这个字符串时，就更新bitmap。

bitmap = [False for x in range(1024)]

word = "foo"
word_hash_one = hash_one(word)
word_hash_two = hash_two(word)

bitmap[word_hash_one] = True
bitmap[word_hash_two] = True

然后查询的话，同样使用相同的hash函数，对bitmap相应的bit进行判断即可。

def check_in_bloom_filter(bitmap, word):
  # bitmap is the bit map
  # word is the string

  hashs = []
  hashs.append(hash_one(word))
  hashs.append(hash_two(word))

  for i in hashs:
    if bitmap[i] == False:
      return False
  return True

print(check_in_bloom_filter(bitmap, "foo"))
# Print True
print(check_in_bloom_filter(bitmap, "bar"))
# Print False

这是本章内容，希望对你有所帮助。进入下一章

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

010布隆过滤器.md

010布隆过滤器.md

实现布隆过滤器

什么是布隆过滤器

如何实现布隆过滤器

Files

010布隆过滤器.md

Latest commit

History

010布隆过滤器.md

File metadata and controls

实现布隆过滤器

什么是布隆过滤器

如何实现布隆过滤器