Skip to content
Kim Rutherford edited this page Nov 21, 2021 · 36 revisions

Nightly update

Data directory

Most data files used by the nightly load is in /var/pomcur/sources/ on the server. The nighty load script updates from the sources as it runs.

Cron job

Runs nightly at 9pm-ish:

(cd $HOME/git/pombase-legacy; git pull) && $HOME/git/pombase-legacy/etc/nightly_load 

The nightly_load script

  • gets the latest versions of data, code and config from Git (git pull)
    • https://github.com/pombase/pombase-config
    • https://github.com/pombase/pombase-chado
    • https://github.com/pombase/pombase-legacy
    • https://github.com/pombase/website
    • https://github.com/pombase/chobo
  • runs pombase-legacy/script/make-db to:
    • create a new temporary database (eg. pombase-chado-base-2021-11-20)
    • initialise it with a Chado schema from pombase-legacy/pombase-chado-base.dump
    • updates local copies of ontologies (GO, SO, PRO, FYPO)
    • loads OBO files into pombase-chado-base-2021-11-20
    • loads misc. PomBase specific OBO files
    • populate cvtermpath using owltools (via pombase-chado/script/relation-graph-chado-closure.pl)
    • create PomBase specific materialized views
  • runs pombase-legacy/etc/load-all.sh to:
    • create pombase-build-2021-11-20 using pombase-chado-base-2021-11-20 as a template
    • load PomBase organisms, properties and references
    • load human, japonicus and cerevisiae features (mostly genes)
    • load pombe gene structures, identifies, gene names and legacy (non-GO and non-FYPO) annotation from *.contig in the PomBase Subversion repo (pombe-embl) using pombase-legacy/script/load-chado.pl
    • load pombe genes that have no location from pombe-embl/supporting_files/features_without_coordinates.txt
    • load pombe GO annotation from pombe-embl/supporting_files/legacy_go_annotations_from_contigs.txt (a GAF file)
      • these annotations were originally in the *.contig files
    • load BioGRID interactions where both interactors are pombe genes
    • load PomBase curated interactions from pombe-embl/external_data/interactions (BioGRID format)
    • load misc. external GO annotations from pombe-embl/external_data/external-go-data/* (GAF format)
    • load http://snapshot.geneontology.org/products/annotations/pombase-prediction.gaf
    • load GOA pombe annotations (with quite a few filters)
    • load http://snapshot.geneontology.org/annotations/pombase.gaf.gz
    • load pathways from KEGG
    • load pombe identifiers and data from RNAcentral
    • load curated protein modification annotation from files in pombe-embl/external_data/modification_files/
    • load quantitative expression annotation from pombe-embl/external_data/Quantitative_gene_expression_data/
    • load qualitative expression annotation from pombe-embl/external_data/qualitative_gene_expression_data/
    • load curated high throughput phenotype annotation from pombe-embl/external_data/phaf_files/chado_load/htp_phafs/
    • load curated low throughput phenotype annotation from pombe-embl/external_data/phaf_files/chado_load/ltp_phafs/
    • human orthologs from:
      • pombe-embl/orthologs/compara_orths.tsv
      • pombe-embl/orthologs/conserved_multi.txt
      • pombe-embl/orthologs/conserved_one_to_one.txt
    • japonicus orthologs from
    • load Malacard disease associations from pombe-embl/external_data/disease/malacards_data_for_chado_mondo_ids.tsv
    • load PomBase curation disease associations from pombe-embl/external_data/disease/pombase_disease_associations_mondo_ids.txt
    • create automatic reciprocal IPI annotations using pombase-chado/script/pombase-process.pl with the add-reciprocal-ipi-annotations function
    • load the Canto curation data from a JSON file that is exported nightly
    • load extra allele synonyms from pombe-embl/supporting_files/allele_synonyms.txt
    • extra allele comments from pombe-embl/supporting_files/allele_comments.txt
    • use the mapping file pombe-embl/chado_load_mappings/ECO_evidence_mapping.txt to add ECO evidence codes to the annotation in Chado
    • (where possible) add missing allele names using the gene name and allele description
      • using pombase-process.pl with the add-missing-allele-names option
    • update deletion allele names from eg. SPAC1234c.12delta to abcdelta if SPAC1234c.12 now has a gene name
      • using pombase-process.pl with the update-allele-names option
    • change with properties containing a UniProt ID to the corresponding pombe ID
      • pombase-process.pl with the uniprot-ids-to-local function
    • fix some GO terms in annotations using the mapping file pombe-embl/chado_load_mappings/GO_mapping_to_specific_terms.txt
      • pombase-process.pl with the change-terms option
    • delete annotations that come from UniProt where there is an identical PomBase annotation
      • pombase-process.pl using go-filter-duplicate-assigner
    • run the GO filtering process
    • use the PMIDs of annotations to query PubMed for title, authors, abstract etc. and store the results in Chado
    • run Chado consistency checks defined in https://github.com/pombase/pombase-legacy/blob/master/load-pombase-chado.yaml under the check_chado setting
    • export files from Chado using direct Chado queries, using pombase-chado/script/pombase-export.pl with various options
      • GO: formats GAF, GPAD, GPI
      • FYPO: in PHAF format
      • interactions: in BioGRID format
      • pombe-human, pombe-cereviseae orthologs: TSV format
      • physical interaction: TSV
      • GO substrates: TSV
      • protein modifications: TSV
      • publications_with_annotations.txt for use by the PubMed link-out system
      • log file with counts of annotations by CV
  • builds the website and exports more data files