Skip to content

veleritas/integrate

Repository files navigation

Building hetionet: data integration, hetnet permutation, and Neo4j import

DOI

Hetnets are networks with multiple types of nodes and edges. This repository creates hetionet v1.0, which is a hetnet encoding biology, disease, and pharmacology. We created hetionet v1.0 for Rephetio, our project to systematically evaluate why drugs work and to predict new therapeutic uses for existing drugs.

Note: this repository is for building hetionet v1.0. We recommend that users interested in downloading and using the completed hetnet, do so from the dhimmel/hetionet repository.

Execution

The dependencies are listed in environment.yml. All available through Anaconda or PyPI, except for hetio.

  1. precompile.sh executes notebooks which combine multiple resources into a single type of edge. See the contents of compile for more information.

  2. build.sh builds the hetnet, creates permuted derivatives, and exports the hetnet to Neo4j.

Notebooks

  1. integrate.ipynb creates the hetnet, by integrating data that is stored either in compile or elsewhere on GitHub. All GitHub links use commit hashes to be version specific. The JSON-formatted hetnet is exported to data/hetnet.json.bz2.
  2. permute.ipynb loads the created hetnet and creates permuted derivatives that preserve node degree but destroy edge specificity. The permuted hetnets are written to data/permuted, but are not uploaded due to file size.
  3. neo4j-import.ipynb imports the hetnet and its permutations into separate neo4j instances. These neo4j instances are not uploaded due to file size and licensing issues. Currently, neo4j-community-2.3.3 is used.

Components

License

All original content in this repository is released as CC0. However, the hetnet integrates data from many resources and users should consider the licensing of each source. We apply a license attribute on a per node and per edge basis for sources with defined licenses. However, some resources don't provide any license, so for those we've requested permission. More information is available on Thinklab. See licenses/README.md for a table of all resources and their licensing.

About

Data integration for Project Rephetio

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages