-
Notifications
You must be signed in to change notification settings - Fork 3
Nightly update
Kim Rutherford edited this page Nov 21, 2021
·
36 revisions
Most data files used by the nightly load is in /var/pomcur/sources/
on the server. The nighty load script updates from the sources as it runs.
Runs nightly at 9pm-ish:
(cd $HOME/git/pombase-legacy; git pull) && $HOME/git/pombase-legacy/etc/nightly_load
- gets the latest versions of data, code and config from Git (
git pull
)https://github.com/pombase/pombase-config
https://github.com/pombase/pombase-chado
https://github.com/pombase/pombase-legacy
https://github.com/pombase/website
https://github.com/pombase/chobo
- runs
pombase-legacy/script/make-db
to:- create a new temporary database (eg.
pombase-chado-base-2021-11-20
) - initialise it with a Chado schema from
pombase-legacy/pombase-chado-base.dump
- updates local copies of ontologies (GO, SO, PRO, FYPO)
- loads OBO files into
pombase-chado-base-2021-11-20
- loads misc. PomBase specific OBO files
- populate cvtermpath using owltools (via
pombase-chado/script/relation-graph-chado-closure.pl
) - create PomBase specific materialized views
- create a new temporary database (eg.
- runs
pombase-legacy/etc/load-all.sh
to:- create
pombase-build-2021-11-20
usingpombase-chado-base-2021-11-20
as a template - load PomBase organisms, properties and references
- load human, japonicus and cerevisiae features (mostly genes)
- load pombe gene structures, identifies, gene names and legacy (non-GO and non-FYPO) annotation from
*.contig
in the PomBase Subversion repo (pombe-embl
) usingpombase-legacy/script/load-chado.pl
- load pombe genes that have no location from
pombe-embl/supporting_files/features_without_coordinates.txt
- load pombe GO annotation from
pombe-embl/supporting_files/legacy_go_annotations_from_contigs.txt
(a GAF file)- these annotations were originally in the
*.contig
files
- these annotations were originally in the
- load BioGRID interactions where both interactors are pombe genes
- load PomBase curated interactions from
pombe-embl/external_data/interactions
(BioGRID format) - load misc. external GO annotations from
pombe-embl/external_data/external-go-data/*
(GAF format) - load http://snapshot.geneontology.org/products/annotations/pombase-prediction.gaf
- load GOA pombe annotations (with quite a few filters)
- load http://snapshot.geneontology.org/annotations/pombase.gaf.gz
- load pathways from KEGG
- load pombe identifiers and data from RNAcentral
- load curated protein modification annotation from files in
pombe-embl/external_data/modification_files/
- load quantitative expression annotation from
pombe-embl/external_data/Quantitative_gene_expression_data/
- load qualitative expression annotation from
pombe-embl/external_data/qualitative_gene_expression_data/
- load curated high throughput phenotype annotation from
pombe-embl/external_data/phaf_files/chado_load/htp_phafs/
- load curated low throughput phenotype annotation from
pombe-embl/external_data/phaf_files/chado_load/ltp_phafs/
- human orthologs from:
pombe-embl/orthologs/compara_orths.tsv
pombe-embl/orthologs/conserved_multi.txt
pombe-embl/orthologs/conserved_one_to_one.txt
- japonicus orthologs from
- load Malacard disease associations from
pombe-embl/external_data/disease/malacards_data_for_chado_mondo_ids.tsv
- load PomBase curation disease associations from
pombe-embl/external_data/disease/pombase_disease_associations_mondo_ids.txt
- create automatic reciprocal IPI annotations using
pombase-chado/script/pombase-process.pl
with theadd-reciprocal-ipi-annotations
function - load the Canto curation data from a JSON file that is exported nightly
- load extra allele synonyms from
pombe-embl/supporting_files/allele_synonyms.txt
- extra allele comments from
pombe-embl/supporting_files/allele_comments.txt
- use the mapping file
pombe-embl/chado_load_mappings/ECO_evidence_mapping.txt
to add ECO evidence codes to the annotation in Chado - (where possible) add missing allele names using the gene name and allele description
- using
pombase-process.pl
with theadd-missing-allele-names
option
- using
- update deletion allele names from eg.
SPAC1234c.12delta
toabcdelta
if SPAC1234c.12 now has a gene name- using
pombase-process.pl
with theupdate-allele-names
option
- using
- change
with
properties containing a UniProt ID to the corresponding pombe ID-
pombase-process.pl
with theuniprot-ids-to-local
function
-
- fix some GO terms in annotations using the mapping file
pombe-embl/chado_load_mappings/GO_mapping_to_specific_terms.txt
-
pombase-process.pl
with thechange-terms
option
-
- delete annotations that come from UniProt where there is an identical PomBase annotation
-
pombase-process.pl
usinggo-filter-duplicate-assigner
-
- run the GO filtering process
- use the PMIDs of annotations to query PubMed for title, authors, abstract etc. and store the results in Chado
- run Chado consistency checks defined in https://github.com/pombase/pombase-legacy/blob/master/load-pombase-chado.yaml under the
check_chado
setting - export files from Chado using direct Chado queries, using
pombase-chado/script/pombase-export.pl
with various options- GO: formats GAF, GPAD, GPI
- FYPO: in PHAF format
- interactions: in BioGRID format
- pombe-human, pombe-cereviseae orthologs: TSV format
- physical interaction: TSV
- GO substrates: TSV
- protein modifications: TSV
-
publications_with_annotations.txt
for use by the PubMed link-out system - log file with counts of annotations by CV
- create
- builds the website and exports more data files
- uses the
pombase-chado-json
executable: https://github.com/pombase/pombase-chado-json/
- uses the