-
Notifications
You must be signed in to change notification settings - Fork 3
Nightly update
Most data files used by the nightly load is in /var/pomcur/sources/
on the server. The nighty load script updates from the sources as it runs.
Runs nightly at 9pm-ish:
(cd $HOME/git/pombase-legacy; git pull) && $HOME/git/pombase-legacy/etc/nightly_load
All output from the nightly load is initially written to a date-stamped sub-directory of /var/www/pombase/dumps/builds/
, available as https://curation.pombase.org/dumps/builds/. At the end of a successful run, a symbolic link (latest_build
) is made to the latest load output directory. That's available as: https://curation.pombase.org/dumps/latest_build/ and https://www.pombase.org/nightly_update/
See also: Output data files
https://curation.pombase.org/dumps/latest_build/logs and https://www.pombase.org/nightly_update/logs
The the log file wiki page for descriptions of each file.
- gets the latest versions of data, code and config from Git (
git pull
)https://github.com/pombase/pombase-config
https://github.com/pombase/pombase-chado
https://github.com/pombase/pombase-legacy
https://github.com/pombase/website
https://github.com/pombase/chobo
- runs
pombase-legacy/script/make-db
to:- create a new temporary database (eg.
pombase-chado-base-2021-11-20
) - initialise it with a Chado schema from
pombase-legacy/pombase-chado-base.dump
- updates local copies of ontologies (GO, SO, PRO, FYPO)
- loads OBO files into
pombase-chado-base-2021-11-20
- loads misc. PomBase specific OBO files
- populate cvtermpath using owltools (via
pombase-chado/script/relation-graph-chado-closure.pl
) - create PomBase specific materialized views
- create a new temporary database (eg.
- runs
pombase-legacy/etc/load-all.sh
to:- create
pombase-build-2021-11-20
usingpombase-chado-base-2021-11-20
as a template - load PomBase organisms, properties and references
- load human, japonicus and cerevisiae features (mostly genes)
- load pombe gene structures, identifies, gene names and legacy (non-GO and non-FYPO) annotation from
*.contig
in the PomBase Subversion repo (pombe-embl
) usingpombase-legacy/script/load-chado.pl
- load pombe genes that have no location from
pombe-embl/supporting_files/features_without_coordinates.txt
- load pombe GO annotation from
pombe-embl/supporting_files/legacy_go_annotations_from_contigs.txt
(a GAF file)- these annotations were originally in the
*.contig
files
- these annotations were originally in the
- load BioGRID interactions where both interactors are pombe genes
- load PomBase curated interactions from
pombe-embl/external_data/interactions
(BioGRID format) - load misc. external GO annotations from
pombe-embl/external_data/external-go-data/*
(GAF format) - load http://snapshot.geneontology.org/products/annotations/pombase-prediction.gaf
- load GOA pombe annotations (with quite a few filters)
- load http://snapshot.geneontology.org/annotations/pombase.gaf.gz
- load pathways from KEGG
- load pombe identifiers and data from RNAcentral
- load curated protein modification annotation from files in
pombe-embl/external_data/modification_files/
- load quantitative expression annotation from
pombe-embl/external_data/Quantitative_gene_expression_data/
- load qualitative expression annotation from
pombe-embl/external_data/qualitative_gene_expression_data/
- load curated high throughput phenotype annotation from
pombe-embl/external_data/phaf_files/chado_load/htp_phafs/
- load curated low throughput phenotype annotation from
pombe-embl/external_data/phaf_files/chado_load/ltp_phafs/
- human orthologs from:
pombe-embl/orthologs/compara_orths.tsv
pombe-embl/orthologs/conserved_multi.txt
pombe-embl/orthologs/conserved_one_to_one.txt
- japonicus orthologs from
- load Malacard disease associations from
pombe-embl/external_data/disease/malacards_data_for_chado_mondo_ids.tsv
- load PomBase curation disease associations from
pombe-embl/external_data/disease/pombase_disease_associations_mondo_ids.txt
- create automatic reciprocal IPI annotations using
pombase-chado/script/pombase-process.pl
with theadd-reciprocal-ipi-annotations
function - load the Canto curation data from a JSON file that is exported nightly
- load extra allele synonyms from
pombe-embl/supporting_files/allele_synonyms.txt
- extra allele comments from
pombe-embl/supporting_files/allele_comments.txt
- use the mapping file
pombe-embl/chado_load_mappings/ECO_evidence_mapping.txt
to add ECO evidence codes to the annotation in Chado - (where possible) add missing allele names using the gene name and allele description
- using
pombase-process.pl
with theadd-missing-allele-names
option
- using
- update deletion allele names from eg.
SPAC1234c.12delta
toabcdelta
if SPAC1234c.12 now has a gene name- using
pombase-process.pl
with theupdate-allele-names
option
- using
- change
with
properties containing a UniProt ID to the corresponding pombe ID-
pombase-process.pl
with theuniprot-ids-to-local
function
-
- fix some GO terms in annotations using the mapping file
pombe-embl/chado_load_mappings/GO_mapping_to_specific_terms.txt
-
pombase-process.pl
with thechange-terms
option
-
- delete annotations that come from UniProt where there is an identical PomBase annotation
-
pombase-process.pl
usinggo-filter-duplicate-assigner
-
- run the GO filtering process
- use the PMIDs of annotations to query PubMed for title, authors, abstract etc. and store the results in Chado
- run Chado consistency checks defined in https://github.com/pombase/pombase-legacy/blob/master/load-pombase-chado.yaml under the
check_chado
setting - export files from Chado using direct Chado queries, using
pombase-chado/script/pombase-export.pl
with various options- FYPO: in PHAF format
- interactions: in BioGRID format
- pombe-human, pombe-cereviseae orthologs: TSV format
- physical interaction: TSV
- GO substrates: TSV
- protein modifications: TSV
-
publications_with_annotations.txt
for use by the PubMed link-out system - log file with counts of annotations by CV
- create
- run the
pombase-chado-json
executable to create JSON files for the website and extra export files- lots of configuration from the main web config file
- https://github.com/pombase/pombase-chado-json/
- reads all of Chado database into memory for fast processing
- output is written to: https://curation.pombase.org/dumps/latest_build/web-json/ and https://curation.pombase.org/dumps/latest_build/web-json/misc/
- it generates:
- a large, compressed JSON file with data for every gene, genotype, term and reference page: https://curation.pombase.org/dumps/latest_build/web-json/api_maps.json.gz
- JSON for passing to Solr: https://curation.pombase.org/dumps/latest_build/web-json/solr_data/
- all the text files in the
misc
directory: https://curation.pombase.org/dumps/latest_build/misc/ - other small JSON files used by the website for the Quick Search, Advanced Search and for slimming
- GO GAF, GPAD and GPI files
- data files for PombeMine - see: Files for InterMine/PombeMine
- build a Docker image for the website:
- the web config file
- containing the Javascript/Typescript code and the HTML for the site (built using the Angular framework)
- the web config file is compiled into the Angular app
- JBrowse code and the JBrowse data and config
- the data is fasta and GFF generated by
pombase-chado-json
earlier
- the data is fasta and GFF generated by
- Solr and Solr data files (generated by
pombase-chado-json
earlier) - Python code for running the protein motif search tool and for generating the expression violin plots
- the main executable
pombase-server
- serves the HTML, Javascript, images, JSON data files, sequence data etc.
- proxies requests to Solr and to the Python protein motif server
- processes queries
- the Docker image is configured so that
pombase-server
, the Python protein motif search server and Solr are started automatically when the image is deployed
- update the Docker image on oliver1 so that http://dev.pombase.kmr.nz/ is updated
- copy the Docker image to the VM at Babraham
- arrange for https://pombase.org (hosted on the VM) to swap to use the new Docker image at 6am
- update
pombe-embl/ftp_site/pombe/
in Subversion with the files generated by the nightly load - copy the new exported files to the ftp directory on the Babraham VM (
/home/ftp/pombase/
) - copy the latest build directory (
/var/www/pombase/dumps/latest_build
on oliver1) to the Babraham VM (directory:/home/ftp/pombase/nightly_update
), available as https://www.pombase.org/nightly_update/