Skip to content

Latest commit

 

History

History
246 lines (181 loc) · 6.96 KB

README.md

File metadata and controls

246 lines (181 loc) · 6.96 KB

🔥 dracarys - UMCCR Workflow Tidying

Conda install Conda install

🏆 Aim

Given a directory with results from a DRAGEN/UMCCR workflow, {dracarys} will grab files of interest and transform them into ‘tidier’ structures for output into TSV/Parquet/RDS format for downstream ingestion into a database/data lake. See supported workflows, running examples, and CLI options in the sections below.

🍕 Installation

R

remotes::install_github("umccr/[email protected]") # for vX.X.X Release/Tag

Conda

  • Linux & MacOS (non-M1)
mamba create \
  -n dracarys_env \
  -c umccr -c bioconda -c conda-forge \
  r-dracarys==X.X.X

conda activate dracarys_env
  • MacOS M1
CONDA_SUBDIR=osx-64 \
  mamba create \
  -n dracarys_env \
  -c umccr -c bioconda -c conda-forge \
  r-dracarys==X.X.X

conda activate dracarys_env

Docker

docker pull --platform linux/amd64 ghcr.io/umccr/dracarys:X.X.X

✨ Supported Workflows

{dracarys} supports most outputs from the following DRAGEN/UMCCR workflows:

Workflow Description
bcl_convert BCLConvert workflow
tso_ctdna_tumor_only ctDNA TSO500 workflow
wgs_alignment_qc DRAGEN DNA (alignment) workflow
wts_alignment_qc DRAGEN RNA (alignment) workflow
wts_tumor_only DRAGEN RNA workflow
wgs_tumor_normal DRAGEN Tumor/Normal workflow
umccrise umccrise workflow
rnasum RNAsum workflow
sash sash workflow
oncoanalyser oncoanalyser workflow

See which output files from these workflows are supported in Supported Files.

🌀 CLI

A dracarys.R command line interface is available for convenience.

  • If you’re using the conda package, the dracarys.R command will already be available inside the activated conda environment.
  • If you’re not using the conda package, you need to export the dracarys/inst/cli/ directory to your PATH in order to use dracarys.R.
dracarys_cli=$(Rscript -e 'x = system.file("cli", package = "dracarys"); cat(x, "\n")' | xargs)
export PATH="${dracarys_cli}:${PATH}"
dracarys.R --version
dracarys.R 0.16.0

#-----------------------------------#
dracarys.R --help
usage: dracarys.R [-h] [-v] {tidy} ...

🐉 DRAGEN Output Post-Processing 🔥

positional arguments:
  {tidy}         sub-command help
    tidy         Tidy UMCCR Workflow Outputs

options:
  -h, --help     show this help message and exit
  -v, --version  show program's version number and exit

#-----------------------------------#
#------- Tidy ----------------------#
dracarys.R tidy --help
usage: dracarys.R tidy [-h] -i IN_DIR -o OUT_DIR -p PREFIX [-t TOKEN]
                       [-l LOCAL_DIR] [-f FORMAT] [-n] [-q]

options:
  -h, --help            show this help message and exit
  -i IN_DIR, --in_dir IN_DIR
                        ⛄️ Directory with untidy UMCCR workflow results. Can
                        be GDS, S3 or local.
  -o OUT_DIR, --out_dir OUT_DIR
                        🔥 Directory to output tidy results.
  -p PREFIX, --prefix PREFIX
                        🎻 Prefix string used for all results.
  -t TOKEN, --token TOKEN
                        🙈 ICA access token. Default: ICA_ACCESS_TOKEN env var.
  -l LOCAL_DIR, --local_dir LOCAL_DIR
                        📥 If input is a GDS/S3 directory, download the
                        recognisable files to this directory. Default:
                        '<out_dir>/dracarys_<gds|s3>_sync'.
  -f FORMAT, --format FORMAT
                        🎨 Format of output. Default: tsv.
  -n, --dryrun          🐫 Dry run - just show files to be tidied.
  -q, --quiet           😴 Shush all the logs.

🚕 Running

{dracarys} takes as input (--in_dir) a directory with results from one of the UMCCR workflows. It will recursively scan that directory for supported files, download those into a local directory (--gds_local_dir), and then it will parse, transform and write the tidied versions into the specified output directory (--out_dir). A prefix (--prefix) is prepended to each of the tidied files. The output file format (--format) can be tsv, parquet, or both. To get just a list of supported files within the specified input directory, use the -n (--dryrun) option.

R

# help(umccr_tidy)
in_dir <- "gds://path/to/subjectX_multiqc_data/"
out_dir <- tempdir()
prefix <- "subjectX"
umccr_tidy(in_dir = in_dir, out_dir = out_dir, prefix = prefix)

Mac/Linux

From within an activated conda environment or a shell with the dracarys.R CLI available:

dracarys.R tidy \
      -i gds://path/to/subjectX_multiqc_data/ \
      -o local_output_dir \
      -p subjectX_prefix

Docker

docker container run \
  -v $(PWD):/mount1 \
  --platform=linux/amd64 \
  --env "ICA_ACCESS_TOKEN" \
  --rm -it \
  ghcr.io/umccr/dracarys:X.X.X \
    dracarys.R tidy \
      -i gds://path/to/subjectX_multiqc_data/ \
      -o /mount1/output_dir \
      -p subjectX_prefix