Nightly update

Data directory

Most data files used by the nightly load is in /var/pomcur/sources/ on the server. The nighty load script downloads updates from the into that directory as it runs.

Cron job

Runs nightly at 9pm-ish and finishes around 5am:

(cd $HOME/git/pombase-legacy; git pull) && $HOME/git/pombase-legacy/etc/nightly_load

This file is the progress log: /var/pomcur/logs/nightly_load.log

If there are no problems with the load the log file will end with something like:

sucessfully finished building: pombase-build-2023-10-10

To check the end of the log file without loading into the server, run:

ssh [email protected] 'tail -50 /var/pomcur/logs/nightly_load.log'

where "USERNAME" oliver1 user name.

Output directory

All output from the nightly load is initially written to a date-stamped sub-directory of /var/www/pombase/dumps/builds/, available as https://curation.pombase.org/dumps/builds/. At the end of a successful run, a symbolic link (latest_build) is made to the latest load output directory. That's available as: https://curation.pombase.org/dumps/latest_build/ and https://www.pombase.org/nightly_update/

Log files

https://curation.pombase.org/dumps/latest_build/logs and https://www.pombase.org/nightly_update/logs

The the log file wiki page for descriptions of each file.

The `nightly_load` script

gets the latest versions of data, code and config from Git (git pull)
- https://github.com/pombase/pombase-config
- https://github.com/pombase/pombase-chado
- https://github.com/pombase/pombase-legacy
- https://github.com/pombase/website
- https://github.com/pombase/chobo
runs pombase-legacy/script/make-db to:
- create a new temporary database (eg. pombase-chado-base-2021-11-20)
- initialise it with a Chado schema from pombase-legacy/pombase-chado-base.dump
- updates local copies of ontologies (GO, SO, PRO, FYPO)
- loads OBO files into pombase-chado-base-2021-11-20
- loads misc. PomBase specific OBO files
- populate cvtermpath using owltools (via pombase-chado/script/relation-graph-chado-closure.pl)
- create PomBase specific materialized views
runs pombase-legacy/etc/load-all.sh to:
- create pombase-build-2021-11-20 using pombase-chado-base-2021-11-20 as a template
- load PomBase organisms, properties and references
- load human, japonicus and cerevisiae features (mostly genes)
- load pombe gene structures, identifies, gene names and legacy (non-GO and non-FYPO) annotation from *.contig in the PomBase Subversion repo (pombe-embl) using pombase-legacy/script/load-chado.pl
- load pombe genes that have no location from pombe-embl/supporting_files/features_without_coordinates.txt
- load pombe GO annotation from pombe-embl/supporting_files/legacy_go_annotations_from_contigs.txt (a GAF file)
  - these annotations were originally in the *.contig files
- load BioGRID interactions where both interactors are pombe genes
- load PomBase curated interactions from pombe-embl/external_data/interactions (BioGRID format)
- load misc. external GO annotations from pombe-embl/external_data/external-go-data/* (GAF format)
- load http://snapshot.geneontology.org/products/annotations/pombase-prediction.gaf
- load GOA pombe annotations (with quite a few filters)
- load http://snapshot.geneontology.org/annotations/pombase.gaf.gz
- load pathways from KEGG
- load pombe identifiers and data from RNAcentral
- load curated protein modification annotation from files in pombe-embl/external_data/modification_files/
- load quantitative expression annotation from pombe-embl/external_data/Quantitative_gene_expression_data/
- load qualitative expression annotation from pombe-embl/external_data/qualitative_gene_expression_data/
- load curated high throughput phenotype annotation from pombe-embl/external_data/phaf_files/chado_load/htp_phafs/
- load curated low throughput phenotype annotation from pombe-embl/external_data/phaf_files/chado_load/ltp_phafs/
- human orthologs from:
  - pombe-embl/orthologs/compara_orths.tsv
  - pombe-embl/orthologs/conserved_multi.txt
  - pombe-embl/orthologs/conserved_one_to_one.txt
- japonicus orthologs from
  - https://github.com/japonicusdb/japonicus-curation/blob/main/compara_pombe_orthologs.tsv
  - https://github.com/japonicusdb/japonicus-curation/blob/main/rhind_pombe_orthologs.tsv
  - https://github.com/japonicusdb/japonicus-curation/blob/main/manual_pombe_orthologs.tsv
- load Malacard disease associations from pombe-embl/external_data/disease/malacards_data_for_chado_mondo_ids.tsv
- load PomBase curation disease associations from pombe-embl/external_data/disease/pombase_disease_associations_mondo_ids.txt
- create automatic reciprocal IPI annotations using pombase-chado/script/pombase-process.pl with the add-reciprocal-ipi-annotations function
- load the Canto curation data from a JSON file that is exported nightly
- load extra allele synonyms from pombe-embl/supporting_files/allele_synonyms.txt
- extra allele comments from pombe-embl/supporting_files/allele_comments.txt
- use the mapping file pombe-embl/chado_load_mappings/ECO_evidence_mapping.txt to add ECO evidence codes to the annotation in Chado
- (where possible) add missing allele names using the gene name and allele description
  - using pombase-process.pl with the add-missing-allele-names option
- update deletion allele names from eg. SPAC1234c.12delta to abcdelta if SPAC1234c.12 now has a gene name
  - using pombase-process.pl with the update-allele-names option
- change with properties containing a UniProt ID to the corresponding pombe ID
  - pombase-process.pl with the uniprot-ids-to-local function
- fix some GO terms in annotations using the mapping file pombe-embl/chado_load_mappings/GO_mapping_to_specific_terms.txt
  - pombase-process.pl with the change-terms option
- delete annotations that come from UniProt where there is an identical PomBase annotation
  - pombase-process.pl using go-filter-duplicate-assigner
- run the GO filtering process
- use the PMIDs of annotations to query PubMed for title, authors, abstract etc. and store the results in Chado
- run Chado consistency checks defined in https://github.com/pombase/pombase-legacy/blob/master/load-pombase-chado.yaml under the check_chado setting
- export files from Chado using direct Chado queries, using pombase-chado/script/pombase-export.pl with various options
  - FYPO: in PHAF format
  - interactions: in BioGRID format
  - pombe-human, pombe-cereviseae orthologs: TSV format
  - physical interaction: TSV
  - GO substrates: TSV
  - protein modifications: TSV
  - publications_with_annotations.txt for use by the PubMed link-out system
  - log file with counts of annotations by CV
run the pombase-chado-json executable to create JSON files for the website and extra export files
- lots of configuration from the main web config file
- https://github.com/pombase/pombase-chado-json/
- reads all of Chado database into memory for fast processing
- output is written to: https://curation.pombase.org/dumps/latest_build/web-json/ and https://curation.pombase.org/dumps/latest_build/web-json/misc/
- it generates:
  - a large, compressed JSON file with data for every gene, genotype, term and reference page: https://curation.pombase.org/dumps/latest_build/web-json/api_maps.json.gz
  - JSON for passing to Solr: https://curation.pombase.org/dumps/latest_build/web-json/solr_data/
  - all the text files in the misc directory: https://curation.pombase.org/dumps/latest_build/misc/
    - see the main config file
  - other small JSON files used by the website for the Quick Search, Advanced Search and for slimming
  - GO GAF, GPAD and GPI files
  - data files for PombeMine - see: Files for InterMine/PombeMine
  - files for APICURON
build a Docker image for the website:
- the web config file
- containing the Javascript/Typescript code and the HTML for the site (built using the Angular framework)
  - the web config file is compiled into the Angular app
- JBrowse code and the JBrowse data and config
  - the data is fasta and GFF generated by pombase-chado-json earlier
- Solr and Solr data files (generated by pombase-chado-json earlier)
- Python code for running the protein motif search tool and for generating the expression violin plots
- the main executable pombase-server
  - serves the HTML, Javascript, images, JSON data files, sequence data etc.
  - proxies requests to Solr and to the Python protein motif server
  - processes queries
- the Docker image is configured so that pombase-server, the Python protein motif search server and Solr are started automatically when the image is deployed
update the Docker image on oliver1 so that http://dev.pombase.kmr.nz/ is updated
copy the Docker image to the VM at Babraham
arrange for https://pombase.org (hosted on the VM) to swap to use the new Docker image at 6am
update pombe-embl/ftp_site/pombe/ in Subversion with the files generated by the nightly load
copy the new exported files to the ftp directory on the Babraham VM (/home/ftp/pombase/)
copy the latest build directory (/var/www/pombase/dumps/latest_build on oliver1) to the Babraham VM (directory: /home/ftp/pombase/nightly_update), available as https://www.pombase.org/nightly_update/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nightly update

Data directory

Cron job

Output directory

Log files

The `nightly_load` script

Clone this wiki locally

Nightly update

Data directory

Cron job

Output directory

Log files

The nightly_load script

Clone this wiki locally

The `nightly_load` script