Skip to content
Kim Rutherford edited this page Oct 10, 2023 · 36 revisions

Data directory

Most data files used by the nightly load is in /var/pomcur/sources/ on the server. The nighty load script downloads updates from the into that directory as it runs.

Cron job

Runs nightly at 9pm-ish and finishes around 4pm:

(cd $HOME/git/pombase-legacy; git pull) && $HOME/git/pombase-legacy/etc/nightly_load 

This file is the progress log: /var/pomcur/logs/nightly_load.log

If there are no problems with the load the log file will end with something like:

sucessfully finished building: pombase-build-2023-10-10

To check the end of the log file without loading into the server, run:

ssh [email protected] 'tail -50 /var/pomcur/logs/nightly_load.log'

where "USERNAME" oliver1 user name.

Output directory

All output from the nightly load is initially written to a date-stamped sub-directory of /var/www/pombase/dumps/builds/, available as https://curation.pombase.org/dumps/builds/. At the end of a successful run, a symbolic link (latest_build) is made to the latest load output directory. That's available as: https://curation.pombase.org/dumps/latest_build/ and https://www.pombase.org/nightly_update/

See also: Output data files

Log files

https://curation.pombase.org/dumps/latest_build/logs and https://www.pombase.org/nightly_update/logs

The the log file wiki page for descriptions of each file.

The nightly_load script

  • gets the latest versions of data, code and config from Git (git pull)
    • https://github.com/pombase/pombase-config
    • https://github.com/pombase/pombase-chado
    • https://github.com/pombase/pombase-legacy
    • https://github.com/pombase/website
    • https://github.com/pombase/chobo
  • runs pombase-legacy/script/make-db to:
    • create a new temporary database (eg. pombase-chado-base-2021-11-20)
    • initialise it with a Chado schema from pombase-legacy/pombase-chado-base.dump
    • updates local copies of ontologies (GO, SO, PRO, FYPO)
    • loads OBO files into pombase-chado-base-2021-11-20
    • loads misc. PomBase specific OBO files
    • populate cvtermpath using owltools (via pombase-chado/script/relation-graph-chado-closure.pl)
    • create PomBase specific materialized views
  • runs pombase-legacy/etc/load-all.sh to:
    • create pombase-build-2021-11-20 using pombase-chado-base-2021-11-20 as a template
    • load PomBase organisms, properties and references
    • load human, japonicus and cerevisiae features (mostly genes)
    • load pombe gene structures, identifies, gene names and legacy (non-GO and non-FYPO) annotation from *.contig in the PomBase Subversion repo (pombe-embl) using pombase-legacy/script/load-chado.pl
    • load pombe genes that have no location from pombe-embl/supporting_files/features_without_coordinates.txt
    • load pombe GO annotation from pombe-embl/supporting_files/legacy_go_annotations_from_contigs.txt (a GAF file)
      • these annotations were originally in the *.contig files
    • load BioGRID interactions where both interactors are pombe genes
    • load PomBase curated interactions from pombe-embl/external_data/interactions (BioGRID format)
    • load misc. external GO annotations from pombe-embl/external_data/external-go-data/* (GAF format)
    • load http://snapshot.geneontology.org/products/annotations/pombase-prediction.gaf
    • load GOA pombe annotations (with quite a few filters)
    • load http://snapshot.geneontology.org/annotations/pombase.gaf.gz
    • load pathways from KEGG
    • load pombe identifiers and data from RNAcentral
    • load curated protein modification annotation from files in pombe-embl/external_data/modification_files/
    • load quantitative expression annotation from pombe-embl/external_data/Quantitative_gene_expression_data/
    • load qualitative expression annotation from pombe-embl/external_data/qualitative_gene_expression_data/
    • load curated high throughput phenotype annotation from pombe-embl/external_data/phaf_files/chado_load/htp_phafs/
    • load curated low throughput phenotype annotation from pombe-embl/external_data/phaf_files/chado_load/ltp_phafs/
    • human orthologs from:
      • pombe-embl/orthologs/compara_orths.tsv
      • pombe-embl/orthologs/conserved_multi.txt
      • pombe-embl/orthologs/conserved_one_to_one.txt
    • japonicus orthologs from
    • load Malacard disease associations from pombe-embl/external_data/disease/malacards_data_for_chado_mondo_ids.tsv
    • load PomBase curation disease associations from pombe-embl/external_data/disease/pombase_disease_associations_mondo_ids.txt
    • create automatic reciprocal IPI annotations using pombase-chado/script/pombase-process.pl with the add-reciprocal-ipi-annotations function
    • load the Canto curation data from a JSON file that is exported nightly
    • load extra allele synonyms from pombe-embl/supporting_files/allele_synonyms.txt
    • extra allele comments from pombe-embl/supporting_files/allele_comments.txt
    • use the mapping file pombe-embl/chado_load_mappings/ECO_evidence_mapping.txt to add ECO evidence codes to the annotation in Chado
    • (where possible) add missing allele names using the gene name and allele description
      • using pombase-process.pl with the add-missing-allele-names option
    • update deletion allele names from eg. SPAC1234c.12delta to abcdelta if SPAC1234c.12 now has a gene name
      • using pombase-process.pl with the update-allele-names option
    • change with properties containing a UniProt ID to the corresponding pombe ID
      • pombase-process.pl with the uniprot-ids-to-local function
    • fix some GO terms in annotations using the mapping file pombe-embl/chado_load_mappings/GO_mapping_to_specific_terms.txt
      • pombase-process.pl with the change-terms option
    • delete annotations that come from UniProt where there is an identical PomBase annotation
      • pombase-process.pl using go-filter-duplicate-assigner
    • run the GO filtering process
    • use the PMIDs of annotations to query PubMed for title, authors, abstract etc. and store the results in Chado
    • run Chado consistency checks defined in https://github.com/pombase/pombase-legacy/blob/master/load-pombase-chado.yaml under the check_chado setting
    • export files from Chado using direct Chado queries, using pombase-chado/script/pombase-export.pl with various options
      • FYPO: in PHAF format
      • interactions: in BioGRID format
      • pombe-human, pombe-cereviseae orthologs: TSV format
      • physical interaction: TSV
      • GO substrates: TSV
      • protein modifications: TSV
      • publications_with_annotations.txt for use by the PubMed link-out system
      • log file with counts of annotations by CV
  • run the pombase-chado-json executable to create JSON files for the website and extra export files
  • build a Docker image for the website:
    • the web config file
    • containing the Javascript/Typescript code and the HTML for the site (built using the Angular framework)
    • JBrowse code and the JBrowse data and config
      • the data is fasta and GFF generated by pombase-chado-json earlier
    • Solr and Solr data files (generated by pombase-chado-json earlier)
    • Python code for running the protein motif search tool and for generating the expression violin plots
    • the main executable pombase-server
      • serves the HTML, Javascript, images, JSON data files, sequence data etc.
      • proxies requests to Solr and to the Python protein motif server
      • processes queries
    • the Docker image is configured so that pombase-server, the Python protein motif search server and Solr are started automatically when the image is deployed
  • update the Docker image on oliver1 so that http://dev.pombase.kmr.nz/ is updated
  • copy the Docker image to the VM at Babraham
  • arrange for https://pombase.org (hosted on the VM) to swap to use the new Docker image at 6am
  • update pombe-embl/ftp_site/pombe/ in Subversion with the files generated by the nightly load
  • copy the new exported files to the ftp directory on the Babraham VM (/home/ftp/pombase/)
  • copy the latest build directory (/var/www/pombase/dumps/latest_build on oliver1) to the Babraham VM (directory: /home/ftp/pombase/nightly_update), available as https://www.pombase.org/nightly_update/