tests | |
---|---|
package |
Text mining of the biomedical literature has been successful in retrieving interactions between proteins, non-coding RNAs, and chemicals as well as in determining tissue-specific expression and subcellular localization. Simple co-occurrence-based scoring schemes can uncover such associations by finding entity pairs that are frequently mentioned together but ignore the textual context of each co-occurrence.
CoCoScore implements an improved context-aware co-occurrence scoring scheme that uses textual context to assess whether an association is described in a given sentence or not. CoCoScore achieves superior performance compared to previous approaches that rely on constant sentence scores, based on datasets of disease-gene, tissue-gene, and protein-protein associations. In our research, we use distant supervision to create an automatic, but noisy, labelling of a large dataset of sentences co-mentioning two entities of interest.
Free software: MIT license
To install CoCoScore via bioconda (for Linux and Mac OS):
conda install -c bioconda cocoscore
To install CoCoScore via pip:
pip install cocoscore
CoCoScore depends on fastText which needs to be installed separately if CoCoScore was installed via pip. The installation via bioconda automatically installs fastText, too.
If you installed you installed CoCoScore via pip, please build v0.1.0 of fastText as described here and make sure the fasttext
binary is discoverable via your $PATH
environment variable.
fastText v0.1.0 is also available via conda-forge:
conda install -c conda-forge fasttext=0.1.0
CoCoScore docker container:
Bioconda automatically builds a Docker container for CoCoScore. See the package documentation for more information.
- Follow the installation instructions above.
- Download the
demo.ftz
file (see next section) needed to run through the example. - Run through the example to learn how to apply CoCoScore to your own data.
Before running the examples, please download the following file and save it to doc/example/
:
The files are downloaded and placed in the correct directories by executing:
wget -P doc/example/ http://download.jensenlab.org/BLAH4/demo.ftz
A preprint manuscript describing CoCoScore and its performance on eight datasets, compared to a baseline co-occurrence scoring model, is available via bioRxiv.
Supplementary data described in the manuscript can be downloaded via figshare.
CoCoScore is being developed by Alexander Junge and Lars Juhl Jensen at the Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark.
Please open an issue here or write us:
{alexander.junge,lars.juhl.jensen} AT cpr DOT ku DOT dk
See also: https://github.com/JungeAlexander/cocoscore/blob/master/CONTRIBUTING.rst
To run the all tests run:
tox
Note, to combine the coverage data from all the tox environments run:
Windows | set PYTEST_ADDOPTS=--cov-append tox |
---|---|
Other | PYTEST_ADDOPTS=--cov-append tox |