Skip to content

Concepts

Torbjørn Rognes edited this page Apr 17, 2016 · 10 revisions

Concepts the course wants to teach are listed below.

Different ways of parallelisation

  1. Tool internally splits parts of the job on multiple threads, shared memory

Examples:

  • assembly
  • bwa mapping

for course: script it in python

  1. Split the input data file, run tool each, merge results

Examples:

  • mapping reads (bwa, samtools merge)
  • calling SNPs per chromosome
  • MapReduce

for course: script it in python

  1. MPI-like: split job on non-shared memory (clusters) messaging between processes that 1) doesn’t have

Example:

  • Ray assembler

for the course: script it in python

Profiling (measuring performance)

  • cpu usage
  • memory usage
  • disk I/O

HPC architecture

  • single laptop
  • single server
  • cluster (abel for the course)
  • cloud (or Amazon/Azure/Google cloud etc) cPouta for the course

Accessing HPC infrastructure

  • interactive (ssh)
  • queueing system (PBS, SGE, Slurm) for the course: SLURM
  • qlogin
  • Galaxy (for the course: cPouta)

Workflows

  • linear, for example shell script
  • make/snakemake (but I don’t think we should teach that)
  • Galaxy workflow
  • arraryrun (SLURM)
  • sdag (SLURM)
Clone this wiki locally