Tool for downloading a specific subnetwork of the KnowEnG Knowledge Network
This repo contains the Python3 kn_fetcher.py
script that performs the subnetwork extraction, a
Dockerfile
to create a Docker image, and a Dockstore.cwl
that is used by the Dockstore
to register this Docker image container and describe how to call kn_fetcher for the community.
There are four input parameters to this tool that must be specified on the command line in order.
Position | Argument | Description | Default |
---|---|---|---|
1 | BUCKET | Name of S3 bucket where KN is stored | e.g. KnowNets/KN-20rep-1706/userKN-20rep-1706 |
2 | NETWORKTYPE | Type of subnetwork to be fetched | Must be Gene or Property |
3 | TAXONID | Taxon identifier for species to be fetched | e.g. for human '9606' |
4 | EDGETYPE | Keyword for subnetwork edgetype to be fetched | e.g. 'gene_ontology' |
For example, to pull all the Gene Ontology annotations for human genes from the latest KN Build on S3:
/home/kn_fetcher.sh KnowNets/KN-20rep-1706/userKN-20rep-1706 Property 9606 gene_ontology
To find the list of TAXONID identifiers supported by the current version of KnowEnG, please visit this link.
To find the list of EDGETYPE identifiers supported by the current version of KnowEnG, please visit this link.
This output to this tool is three or four tab separated files in the current working directory.
- The columns of this file are defined as follows:
- Node1_id: the internal identifier for the source node of the edge
- Node2_id: the internal identifier for the target node of the edge
- Edge_weight: normalized weight of the edge in the subnetwork
- Edge_type: subnetwork edge type for the edge
- Source_id: internal identifier for the public source file the edge was extracted from
- Line_num: original line number of edge information in the public source file
- This yaml file contains information about the extracted Knowledge Network subnetwork. Its keys include summarizations about the network size (“data”), its public data source details (“datasets”), information about the meaning of its edges (“edge_type”), and some commands and configurations used in its construction (“export”).
- The columns of this file are defined as follows:
- Internal_id: the internal identifier for a node in the subnetwork
- Mapped_id: the mapped internal identifier for a node in the subnetwork
- Node_type: type of node 'Gene' or 'Property'
- Node_alias: common name for network node
- Node_description: full name/description for network node
- This file is produced only when
NETWORKTYPE
isProperty
and contains information nodes about the property nodes of the subnetwork in the same format asC) TAXONID.EDGETYPE.node_map
.
With Docker installed and your current directory the location you wish to download the subnetwork files, a simple command is needed:
docker run --rm -w=`pwd` -v `pwd`:`pwd` knoweng/kn_fetcher:latest \
/home/kn_fetcher.sh KnowNets/KN-20rep-1706/userKN-20rep-1706 Property 9606 gene_ontology
You'll then see three or four files in the current directory. The -w
sets the working directory for the container and the -v
is used to volume mount the current directory to the container.
A sample job parameters file for running a kn_fetcher job with a CWL tool runner is provided, kn_fetcher.job.yml
:
network_type: "Gene"
taxon: "9606"
edge_type: "STRING_textmining"
This template can be modified as needed and passed with the kn_fetcher CWL description, kn_fetcher.cwl
, for execution with a CWL runner tool.
You can also run the tool directly without docker:
git clone https://github.com/KnowEnG/KN_Fetcher.git
cd KN_Fetcher
./kn_fetcher.sh KnowNets/KN-20rep-1706/userKN-20rep-1706 Property 9606 gene_ontology
Normally you would use the knoweng/kn_fetcher:latest
build image tag. But if you need to build the image manually you would execute:
git clone https://github.com/KnowEnG/KN_Fetcher.git
cd KN_Fetcher
docker build -t kn_fetcher .
A list of the current contents of the Knowledge Network can be found here.