Skip to content

umccr/dracarys

This branch is up to date with main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

fb9fef3 Β· Dec 17, 2024
Dec 4, 2023
Oct 8, 2024
Dec 17, 2024
Dec 5, 2024
Oct 8, 2024
Dec 17, 2024
Dec 5, 2024
Oct 8, 2024
Oct 17, 2024
Oct 8, 2024
Dec 5, 2024
Oct 8, 2024
Aug 16, 2024
Mar 2, 2022
Sep 11, 2023
Aug 28, 2024
Mar 17, 2023
Dec 5, 2024
Aug 16, 2024
May 13, 2024
May 13, 2024
Dec 5, 2024
Oct 8, 2024
Oct 8, 2024
Dec 17, 2024
Dec 11, 2021
Dec 18, 2021
Dec 11, 2021

Repository files navigation

πŸ”₯ dracarys - UMCCR Workflow Tidying

Conda install Conda install

πŸ† Aim

Given a directory with results from a DRAGEN/UMCCR workflow, {dracarys} will grab files of interest and transform them into β€˜tidier’ structures for output into TSV/Parquet/RDS format for downstream ingestion into a database/data lake. See supported workflows, running examples, and CLI options in the sections below.

πŸ• Installation

R

remotes::install_github("umccr/[email protected]") # for vX.X.X Release/Tag

Conda

  • Linux & MacOS (non-M1)
mamba create \
  -n dracarys_env \
  -c umccr -c bioconda -c conda-forge \
  r-dracarys==X.X.X

conda activate dracarys_env
  • MacOS M1
CONDA_SUBDIR=osx-64 \
  mamba create \
  -n dracarys_env \
  -c umccr -c bioconda -c conda-forge \
  r-dracarys==X.X.X

conda activate dracarys_env

Docker

docker pull --platform linux/amd64 ghcr.io/umccr/dracarys:X.X.X

✨ Supported Workflows

{dracarys} supports most outputs from the following DRAGEN/UMCCR workflows:

Workflow Description
bcl_convert BCLConvert workflow
tso_ctdna_tumor_only ctDNA TSO500 workflow
wgs_alignment_qc DRAGEN DNA (alignment) workflow
wts_alignment_qc DRAGEN RNA (alignment) workflow
wts_tumor_only DRAGEN RNA workflow
wgs_tumor_normal DRAGEN Tumor/Normal workflow
umccrise umccrise workflow
rnasum RNAsum workflow
sash sash workflow
oncoanalyser oncoanalyser workflow

See which output files from these workflows are supported in Supported Files.

πŸŒ€ CLI

A dracarys.R command line interface is available for convenience.

  • If you’re using the conda package, the dracarys.R command will already be available inside the activated conda environment.
  • If you’re not using the conda package, you need to export the dracarys/inst/cli/ directory to your PATH in order to use dracarys.R.
dracarys_cli=$(Rscript -e 'x = system.file("cli", package = "dracarys"); cat(x, "\n")' | xargs)
export PATH="${dracarys_cli}:${PATH}"
dracarys.R --version
dracarys.R 0.16.0

#-----------------------------------#
dracarys.R --help
usage: dracarys.R [-h] [-v] {tidy} ...

πŸ‰ DRAGEN Output Post-Processing πŸ”₯

positional arguments:
  {tidy}         sub-command help
    tidy         Tidy UMCCR Workflow Outputs

options:
  -h, --help     show this help message and exit
  -v, --version  show program's version number and exit

#-----------------------------------#
#------- Tidy ----------------------#
dracarys.R tidy --help
usage: dracarys.R tidy [-h] -i IN_DIR -o OUT_DIR -p PREFIX [-t TOKEN]
                       [-l LOCAL_DIR] [-f FORMAT] [-n] [-q]

options:
  -h, --help            show this help message and exit
  -i IN_DIR, --in_dir IN_DIR
                        ⛄️ Directory with untidy UMCCR workflow results. Can
                        be GDS, S3 or local.
  -o OUT_DIR, --out_dir OUT_DIR
                        πŸ”₯ Directory to output tidy results.
  -p PREFIX, --prefix PREFIX
                        🎻 Prefix string used for all results.
  -t TOKEN, --token TOKEN
                        πŸ™ˆ ICA access token. Default: ICA_ACCESS_TOKEN env var.
  -l LOCAL_DIR, --local_dir LOCAL_DIR
                        πŸ“₯ If input is a GDS/S3 directory, download the
                        recognisable files to this directory. Default:
                        '<out_dir>/dracarys_<gds|s3>_sync'.
  -f FORMAT, --format FORMAT
                        🎨 Format of output. Default: tsv.
  -n, --dryrun          🐫 Dry run - just show files to be tidied.
  -q, --quiet           😴 Shush all the logs.

πŸš• Running

{dracarys} takes as input (--in_dir) a directory with results from one of the UMCCR workflows. It will recursively scan that directory for supported files, download those into a local directory (--gds_local_dir), and then it will parse, transform and write the tidied versions into the specified output directory (--out_dir). A prefix (--prefix) is prepended to each of the tidied files. The output file format (--format) can be tsv, parquet, or both. To get just a list of supported files within the specified input directory, use the -n (--dryrun) option.

R

# help(umccr_tidy)
in_dir <- "gds://path/to/subjectX_multiqc_data/"
out_dir <- tempdir()
prefix <- "subjectX"
umccr_tidy(in_dir = in_dir, out_dir = out_dir, prefix = prefix)

Mac/Linux

From within an activated conda environment or a shell with the dracarys.R CLI available:

dracarys.R tidy \
      -i gds://path/to/subjectX_multiqc_data/ \
      -o local_output_dir \
      -p subjectX_prefix

Docker

docker container run \
  -v $(PWD):/mount1 \
  --platform=linux/amd64 \
  --env "ICA_ACCESS_TOKEN" \
  --rm -it \
  ghcr.io/umccr/dracarys:X.X.X \
    dracarys.R tidy \
      -i gds://path/to/subjectX_multiqc_data/ \
      -o /mount1/output_dir \
      -p subjectX_prefix