Skip to content

Commit

Permalink
Version 1.0
Browse files Browse the repository at this point in the history
  • Loading branch information
kochanczyk committed Aug 16, 2018
1 parent 17887a7 commit 1d87b5b
Show file tree
Hide file tree
Showing 10 changed files with 64 additions and 51 deletions.
21 changes: 11 additions & 10 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,20 +1,21 @@
# Run unit tests
test:
python3 -m unittest discover -s tests > /dev/null
@echo "Running unit tests..."
@python3 -m unittest discover -s tests > /dev/null

# Check test coverage using unittest module
coverage:
coverage run --source=cce -m unittest discover -s tests > /dev/null; coverage report
@echo "Checking test coverage using unittest module..."
@coverage run --source=cce -m unittest discover -s tests > /dev/null; coverage report

# Check test coverage and show the results in browser
html:
coverage run --source=cce -m unittest discover -s tests > /dev/null; coverage html; python -m webbrowser "./htmlcov/index.html" &
@echo "Checking test coverage and showing the results in browser..."
@coverage run --source=cce -m unittest discover -s tests > /dev/null; coverage html; python -m webbrowser "./htmlcov/index.html" &

# Check compatibility with Python 2.7
comp:
python2 -m unittest discover -s tests > /dev/null
@echo "Checking compatibility with Python 2.7..."
@python2 -m unittest discover -s tests > /dev/null

install:
pip install .
@echo "Installing cce module via pip..."
@pip install .

.PHONY: test
.PHONY: test coverage html comp install
52 changes: 32 additions & 20 deletions Readme.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ Channel Capacity Estimator

Channel Capacity Estimator (**cce**) is a python module to estimate
`information capacity`_ of a communication channel. Mutual information,
computed as proposed by `Kraskov et al.` (*Physical Review E*, 2004)
Eq. (8), is maximized over input probabilities by means of a constrained
computed as proposed by `Kraskov et al.`_ (*Physical Review E*, 2004,
Eq. (8)), is maximized over input probabilities by means of a constrained
gradient-based stochastic optimization. The only parameter of the Kraskov
algorithm is the number of neighbors, *k*, used in the nearest neighbor
search. In **cce**, channel input is expected to be of categorical type
Expand All @@ -19,8 +19,9 @@ requirements.txt for a complete list of dependencies.

Module **cce** features the research article "Limits to the rate of
information transmission through MAPK pathway" by Grabowski *et al.*,
submitted to *PLOS Computational Biology* in 2018. Release 0.4 of the
code has been included as supplementary data of this article.
submitted to *PLOS Computational Biology* (2018). Version 1.0 of **cce**
(with pre-built documentation) has been included as supplementary code
of this article.

For any updates and fixes to **cce**, please visit project homepage:
http://pmbm.ippt.pan.pl/software/cce
Expand All @@ -38,7 +39,7 @@ There are three major use cases of **cce**:

In the example below, mutual information is calculated between three sets
of points drawn at random from two-dimensional Gaussian distributions,
located at (0,0), (1,1), and at (3,3) (in SciPy, covariance matrices of
located at (0,0), (0,1), and at (3,3) (in SciPy, covariance matrices of
all three distributions by default are identity matrices). Auxiliary
function `label_all_with` helps to prepare the list of all points, in
which each point is labeled according to its distribution of origin.
Expand All @@ -51,15 +52,16 @@ which each point is labeled according to its distribution of origin.
>>> def label_all_with(label, values): return [(label, v) for v in values]
>>>
>>> data = label_all_with('A', mvn(mean=(0,0)).rvs(10000)) \
+ label_all_with('B', mvn(mean=(1,1)).rvs(10000)) \
+ label_all_with('B', mvn(mean=(0,1)).rvs(10000)) \
+ label_all_with('C', mvn(mean=(3,3)).rvs(10000))
>>>
>>> wke(data).calculate_mi(k=50)
0.9386627422798913
>>> wke(data).calculate_mi(k=10)
0.9552107248613955
In this example, probabilities of input distributions, henceforth referred
to as *weights*, are assumed to be equal for all input distributions. Format
of data is akin to [('A', array([-0.4, 2.8])), ('A', array([-0.9, -0.1])), ..., ('B', array([1.7, 0.9])), ..., ('C', array([3.2, 3.3])), ...).
of data is akin to [('A', array([-0.4, 2.8])), ('A', array([-0.9, -0.1])),
..., ('B', array([1.7, 0.9])), ..., ('C', array([3.2, 3.3])), ...).
Entries of data are not required to be grouped according to the label.
Distribution labels can be given as strings, not just single characters.
Instead of NumPy arrays, ordinary lists with coordinates will be also
Expand All @@ -81,12 +83,12 @@ input distributions:
>>> def label_all_with(label, values): return [(label, v) for v in values]
>>>
>>> data = label_all_with('A', mvn(mean=(0,0)).rvs(10000)) \
+ label_all_with('B', mvn(mean=(1,1)).rvs(10000)) \
+ label_all_with('B', mvn(mean=(0,1)).rvs(10000)) \
+ label_all_with('C', mvn(mean=(3,3)).rvs(10000))
>>>
>>> weights = {'A': 3/6, 'B': 1/6, 'C': 2/6}
>>> wke(data).calculate_weighted_mi(weights=weights, k=50)
0.9420502318804324
>>> weights = {'A': 2/6, 'B': 1/6, 'C': 3/6}
>>> wke(data).calculate_weighted_mi(weights=weights, k=10)
1.0065891280377155
(This example involves random numbers, so your result may vary slightly.)

Expand All @@ -102,26 +104,38 @@ input distributions:
>>> def label_all_with(label, values): return [(label, v) for v in values]
>>>
>>> data = label_all_with('A', mvn(mean=(0,0)).rvs(10000)) \
+ label_all_with('B', mvn(mean=(1,1)).rvs(10000)) \
+ label_all_with('B', mvn(mean=(0,1)).rvs(10000)) \
+ label_all_with('C', mvn(mean=(3,3)).rvs(10000))
>>>
>>> wke(data).calculate_maximized_mi(k=50)
(0.98616722147976, {'A': 0.38123083, 'B': 0.16443817, 'C': 0.45433092})
>>> wke(data).calculate_maximized_mi(k=10)
(1.0154510500713743, {'A': 0.33343804, 'B': 0.19158363, 'C': 0.4749783})
The output tuple contains the maximized mutual information (channel capacity)
and probabilities of input distributions that maximize mutual information (argmax).
Optimization is performed within TensorFlow with multiple threads and takes
less than a minute on a quad-core processor.
(This example involves random numbers, so your result may vary slightly.)


Testing
-------
To launch a suite of unit tests run:
To launch a suite of unit tests, run:

.. code:: bash
$ make test
Documentation
-------------
Developer's code documentation may be generated with

.. code:: bash
$ cd docs
$ make html
Installation
------------
To install **cce** locally via pip, run:
Expand All @@ -139,16 +153,14 @@ Then, you can directly start using the package:
>>> ...
Authors
-------

The code was developed by `Frederic Grabowski`_ and `Paweł Czyż`_,
with some guidance from `Marek Kochańczyk`_ and under supervision of
`Tomasz Lipniacki`_ from the `Laboratory of Modeling in Biology and Medicine`_,
`Institute of Fundamental Technological Reasearch, Polish Academy of Sciences`_
in Warsaw.
(IPPT PAN) in Warsaw.


License
Expand Down
12 changes: 6 additions & 6 deletions cce/estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@
from scipy.spatial import cKDTree
from scipy.special import digamma
import numpy as np
from cce.preprocess import normalize, add_noise_if_duplicates
from cce.optimize import weight_optimizer
from cce.score import weight_loss
from cce.preprocessing import normalize, add_noise_if_duplicates
from cce.optimization import weight_optimizer
from cce.scoring import weight_loss


class WeightedKraskovEstimator:
Expand Down Expand Up @@ -39,7 +39,7 @@ def __init__(self, data: list = None, leaf_size: int = 16):
self._number_of_points_for_label = defaultdict(lambda: 0)
self._number_of_labels = None

# Immersed data -- X is mapped from cateogorical data into reals
# Immersed data -- X is mapped from categorical data into reals
# using _huge_dist, and we store spaces X x Y and just Y.
self._immersed_data_full = None
self._immersed_data_coordinates = None
Expand Down Expand Up @@ -226,7 +226,7 @@ def calculate_weighted_mi(self, weights: dict, k: int) -> float:


def optimize_weights(self) -> tuple:
"""Function optimizing weights using weight_optimizer.
"""Optimizes probabilities of input distributions (weights).
Returns
-------
Expand Down Expand Up @@ -281,7 +281,7 @@ def _turn_into_neigh_list(self, indices, special_point_label):


def calculate_neighborhoods(self, k: int):
"""Function that prepares neighborhood_array.
"""Prepares neighborhood_array.
Parameters
----------
Expand Down
File renamed without changes.
6 changes: 3 additions & 3 deletions cce/preprocess.py → cce/preprocessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ def _project_coords(data: list) -> list:


def normalize(data: list) -> list:
"""Perform input data normalization
"""Performs input data normalization.
Parameters
----------
Expand Down Expand Up @@ -44,7 +44,7 @@ def normalize(data: list) -> list:


def unique(arr) -> bool:
"""Check if all points in the array of coordinates are unique.
"""Checks if all points in the array of coordinates are unique.
Parameters
----------
Expand All @@ -60,7 +60,7 @@ def unique(arr) -> bool:


def add_noise_if_duplicates(data: list) -> list:
"""Add noise to input data
"""Adds noise to input data.
Parameters
----------
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = python -msphinx
SPHINXBUILD = python3 -msphinx
SPHINXPROJ = ChannelCapacityEstimator
SOURCEDIR = .
BUILDDIR = _build
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
# The short X.Y version.
version = '1.0'
# The full version, including alpha/beta/rc tags.
release = '1.0'
release = '1.0.0'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
18 changes: 9 additions & 9 deletions docs/source/cce.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,18 @@ cce package
Submodules
----------

cce\.score module
-----------------
cce\.scoring module
-------------------

.. automodule:: cce.score
.. automodule:: cce.scoring
:members:
:undoc-members:
:show-inheritance:

cce\.preprocess module
----------------------
cce\.preprocessing module
-------------------------

.. automodule:: cce.preprocess
.. automodule:: cce.preprocessing
:members:
:undoc-members:
:show-inheritance:
Expand All @@ -28,10 +28,10 @@ cce\.estimator module
:undoc-members:
:show-inheritance:

cce\.optimize module
--------------------
cce\.optimization module
------------------------

.. automodule:: cce.optimize
.. automodule:: cce.optimization
:members:
:undoc-members:
:show-inheritance:
Expand Down
2 changes: 1 addition & 1 deletion tests/test_preprocess.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import unittest
from cce.preprocess import normalize
from cce.preprocessing import normalize

LARGE_VALUES_SMALL_SPREAD = [('1', [1e9, 1e9]),
('1', [1e9+1, 1e9+1]),
Expand Down

0 comments on commit 1d87b5b

Please sign in to comment.