Skip to content

Latest commit

 

History

History

data

Description

Data files that are too large to store in this Github repository have been deposited into zenodo for versioning. These are downloaded by running ./download_data.sh. In addition to the zenodo records, additional information about how the data files were obtained (links to other repositories or scripts, citations) are provided as comments in the shell script.

Pathway definitions files

KEGG PAO1 pathways

The KEGG pathway definitions file ./pao1_data/pseudomonas_KEGG_terms.txt was created using the script ./get_annotations.py. This script accesses Tribe programmatically through its API to retrieve the genes annotated to each KEGG PAO1 pathway (see line 50 in ./download_data.sh).

Additional information: Zelaya RA, Wong AK, Frase AT, Ritchie MD, Greene CS. Tribe: The collaborative platform for reproducible web-based analysis of gene sets. bioRxiv. 2016. doi:10.1101/055913

Nature-NCI PID pathways

The PID pathway definitions file ./tcga_data/PID_pathway_definitions.txt is generated by parsing the file ./tcga_data/c2.cp.v6.0.symbols.gmt, which is provided by MSigDB v6.0 under the Creative Commons Attribution 4.0 International License. We have provided a copy of it here for versioning. The script ./get_pid_pathway_definitions.py parses the .gmt file and produces the PID pathway definitions file (see line 89 in ./download_data.sh).

P. aeruginosa KEGG network demo server information

The files in ./pao1_web_info were obtained from the following sources

Sample annotations file

The sample annotations file PseudomonasAnnotation.tsv was retrieved by running

curl -o PseudomonasAnnotation.tsv https://raw.githubusercontent.com/greenelab/adage-server/00b2d19668516910c7b968f4164005cf5a59ddd6/data/PseudomonasAnnotation.tsv

and is from the Greene Lab adage-server repository (data directory).

Additional information: Tan J, Doing G, Lewis KA, Price CE, Chen KM, Cady KC, Perchuk B, Laub MT, Hogan DA, Greene CS: System-wide automatic extraction of functional signatures in Pseudomonas aeruginosa with eADAGE. bioRxiv 2016, :078659.

Gene identifier information

The gene information file Pseudomonas_aeruginosa_PAO1.gene_info was retrieved by running

curl -O ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Archaea_Bacteria/Pseudomonas_aeruginosa_PAO1.gene_info.gz
gunzip Pseudomonas_aeruginosa_PAO1.gene_info.gz

Additional information: NCBI -- Using Gene