Skip to content

Commit

Permalink
Support for read-only files (#13)
Browse files Browse the repository at this point in the history
This is great, thanks! I'll prep a release today and get it on pypi!
  • Loading branch information
prashnts authored Nov 25, 2019
2 parents d0bb10a + cb87e00 commit a0a75fc
Show file tree
Hide file tree
Showing 5 changed files with 140 additions and 72 deletions.
30 changes: 15 additions & 15 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,30 +13,30 @@ is a member of a set. The `wikipedia page <http://en.wikipedia.org/wiki/Bloom_fi
has further information on their nature. This module implements a Bloom filter
in python that's fast and uses mmap files for better scalability.

Here's a quick example::
Here's a quick example:

.. code:: python
from pybloomfilter import BloomFilter
.. code-block:: python
bf = BloomFilter(10000000, 0.01, 'filter.bloom')
>>> from pybloomfilter import BloomFilter
with open("/usr/share/dict/words") as f:
for word in f:
bf.add(word.rstrip())
>>> bf = BloomFilter(10000000, 0.01, 'filter.bloom')
>>> with open("/usr/share/dict/words") as f:
>>> for word in f:
>>> bf.add(word.rstrip())
print 'apple' in bf
#outputs True
>>> print 'apple' in bf
True
That wasn't so hard, was it? Now, there are a lot of other things
we can do. For instance, let's say we want to create a similar
filter with just a few pieces of fruit::
filter with just a few pieces of fruit:

.. code:: python
fruitbf = bf.copy_template("fruit.bloom")
fruitbf.update(("apple", "banana", "orange", "pear"))
print fruitbf.to_base64()
>>> fruitbf = bf.copy_template("fruit.bloom")
>>> fruitbf.update(("apple", "banana", "orange", "pear"))
>>> print(fruitbf.to_base64())
"eJzt2k13ojAUBuA9f8WFyofF5TWChlTHaPzqrlqFCtj6gQi/frqZM2N7aq3Gis59d2ye85KTRbhk"
"0lyu1NRmsQrgRda0I+wZCfXIaxuWv+jqDxA8vdaf21HIOSn1u6LRE0VL9Z/qghfbBmxZoHsqM3k8"
"N5XyPAxH2p22TJJoqwU9Q0y0dNDYrOHBIa3BwuznapG+KZZq69JUG0zu1tqI5weJKdpGq7PNJ6tB"
Expand Down Expand Up @@ -76,7 +76,7 @@ Install
Please have `Cython` installed. Please note that this version is for Python 3.
In case you are using Python 2, please see https://github.com/axiak/pybloomfiltermmap.

To install:
To install::

$ pip install cython
$ pip install pybloomfiltermmap3
Expand Down
44 changes: 24 additions & 20 deletions docs/ref.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ BloomFilter Class Reference
.. moduleauthor:: Michael Axiak <[email protected]>


.. class:: BloomFilter(capacity : int, error_rate : float, [filename=None : string], [perm=0755])
.. class:: BloomFilter(capacity: int, error_rate: float, [filename = None: string], [mode = "rw+"], [perm=0755])

Create a new BloomFilter object with a given capacity and error_rate.
**Note that we do not check capacity.** This is important, because
I want to be able to support logical OR and AND (see below).
The capacity and error_rate then together serve as a contract---you add
we want to be able to support logical OR and AND (see below).
The capacity and error_rate then together serve as a contract --- you add
less than capacity items, and the Bloom Filter will have an error rate
less than error_rate.

Expand All @@ -24,7 +24,7 @@ Class Methods

.. classmethod:: BloomFilter.open(filename)

Return a BloomFilter object using an already-existing Bloomfilter file.
Return a BloomFilter object using an already existing BloomFilter file.

.. classmethod:: BloomFilter.from_base64(filename, string, [perm=0755])

Expand All @@ -35,11 +35,11 @@ Class Methods
Example::

>>> bf = BloomFilter.from_base64("/tmp/mike.bf",
"eJwFwcuWgiAAANC9v+JCx7By0QKt0GHEbKSknflAQ9QmTyRfP/fW5E9XTRSX"
"qcLlqGNXphAqcfVH\nRoNv0n4JlTpIvAP0e1+RyXX6I637ggA+VPZnTYR1A4"
"Um5s9geYaZZLiT208JIiG3iwhf3Fwlzb3Y\n5NRL4uNQS6/d9OvTDJbnZMnR"
"zcrplOX5kmsVIkQziM+vw4hCDQ3OkN9m3WVfPWzGfaTeRftMCLws\nPnzEzs"
"gjAW60xZTBbj/bOAgYbK50PqjdzvgHZ6FHZw==\n")
"eJwFwcuWgiAAANC9v+JCx7By0QKt0GHEbKSknflAQ9QmTyRfP/fW5E9XTRSX"
"qcLlqGNXphAqcfVH\nRoNv0n4JlTpIvAP0e1+RyXX6I637ggA+VPZnTYR1A4"
"Um5s9geYaZZLiT208JIiG3iwhf3Fwlzb3Y\n5NRL4uNQS6/d9OvTDJbnZMnR"
"zcrplOX5kmsVIkQziM+vw4hCDQ3OkN9m3WVfPWzGfaTeRftMCLws\nPnzEzs"
"gjAW60xZTBbj/bOAgYbK50PqjdzvgHZ6FHZw==\n")
>>> "MIKE" in bf
True

Expand All @@ -60,15 +60,20 @@ Instance Attributes

.. attribute:: BloomFilter.name

The file name (compatible with file objects)
The file name (compatible with file objects).

.. attribute:: BloomFilter.num_bits

The number of bits used in the filter as buckets
The number of bits used in the filter as buckets.

.. attribute:: BloomFilter.num_hashes

The number of hash functions used when computing
The number of hash functions used when computing.

.. attribute:: BloomFilter.read_only

Boolean, indicating if the opened BloomFilter is read-only.
Always ``False`` for an in-memory BloomFilter.


Instance Methods
Expand All @@ -78,8 +83,8 @@ Instance Methods

Add the item to the bloom filter.

:param item: Hashable object
:rtype: Boolean (True if item already in the filter)
:param item: hashable object
:rtype: boolean (``True`` if item already in the filter)

.. method:: BloomFilter.clear_all()

Expand Down Expand Up @@ -121,7 +126,7 @@ Instance Methods
this may not be too useful. I find it useful for debugging so I can
copy filters from one terminal to another in their entirety.

:rtype: Base64 encoded string representing filter
:rtype: base64 encoded string representing filter

.. method:: BloomFilter.update(iterable)

Expand All @@ -136,7 +141,7 @@ Instance Methods

The result will occur **in place**. That is, calling::

bf.union(bf2)
bf.union(bf2)

is a way to add all the elements of bf2 to bf.

Expand All @@ -147,7 +152,7 @@ Instance Methods

The same as union() above except it uses a set AND instead of a
set OR.

*N.B.: Calling this function will render future calls to len()
invalid.*

Expand Down Expand Up @@ -182,11 +187,11 @@ Magic Methods

.. method:: BloomFilter.__ior__(filter) -> BloomFilter

See union(filter)
See :meth:`BloomFilter.union`.

.. method:: BloomFilter.__iand__(filter) -> BloomFilter

See intersection(filter)
See :meth:`BloomFilter.intersection`.

Exceptions
--------------
Expand All @@ -195,4 +200,3 @@ Exceptions

The exception that is raised if len() is called on a BloomFilter
object after |=, &=, intersection(), or union() is used.
5 changes: 4 additions & 1 deletion src/mmapbitarray.c
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ MBArray * mbarray_Create_Mmap(BTYPE num_bits, const char * file, const char * he
MBArray * array = (MBArray *)malloc(sizeof(MBArray));
uint64_t filesize;
int32_t fheaderlen;
int mmap_flags = PROT_READ;

if (!array || errno) {
return NULL;
Expand Down Expand Up @@ -148,9 +149,11 @@ MBArray * mbarray_Create_Mmap(BTYPE num_bits, const char * file, const char * he
}

errno = 0;
// Add PROT_WRITE if we have write permissions
mmap_flags |= (oflag & O_RDWR) ? PROT_WRITE : 0;
array->vector = (DTYPE *)mmap(NULL,
_mmap_size(array),
PROT_READ | PROT_WRITE,
mmap_flags,
MAP_SHARED,
array->fd,
0);
Expand Down
Loading

0 comments on commit a0a75fc

Please sign in to comment.