Skip to content

bihealth/snappy-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Feb 5, 2025
b684e75 · Feb 5, 2025
Jan 16, 2025
Jan 16, 2025
Jul 5, 2024
Dec 30, 2022
Feb 5, 2025
Feb 5, 2025
Feb 5, 2025
Dec 11, 2024
Feb 22, 2021
Jan 25, 2024
Aug 14, 2024
Jan 16, 2025
Nov 15, 2024
Feb 22, 2021
Dec 9, 2024
Jan 24, 2024
Feb 22, 2021
Jun 28, 2024
Jun 28, 2024
Jul 1, 2024
Feb 22, 2021
Jan 16, 2025
Jan 16, 2025
Feb 22, 2021

Repository files navigation

CI Coverage Status Documentation Status

SNAPPY - SNAPPY Nucleic Acid Processing Pipeline

Installation

Installation should be complete in 10 to 15 minutes.

In a nutshell:

# Download & preparation
git clone [email protected]:bihealth/snappy-pipeline.git
cd snappy-pipeline

# If you want to select a given branch, uncomment the following:
# git checkout <branch_name>

# WARNING- make sure that you are in your conda base environment

# Create conda environment "snappy_env" with all requirements:
mamba env create --file environment.yml -n snappy_env
conda activate snappy_env

# Install snappy in snappy_env environment
pip install -e ".[all]"

The dependency group all includes all optional dependencies, i.e. test (for running tests with pytest), dev (for formatting, linting, pre-commit hooks) and docs (for building the documentation with sphinx). If you only want to install the core dependencies, you can omit the [all] part, or choose any combination of the other groups.

See user installation if you just want to use the pipeline.

See developer installation for getting started with working on the pipeline code and also building the documentation.

Using GATK3

Some wrappers rely on GATK 3. GATK v3 is not free software and cannot be redistributed. Earlier, we had an internal CUBI conda server but this limits use of the wrapper for the general public. Now, the using pipeline steps must be activated as follows.

If you are a member of CUBI, you can use the central GATK download. Alternatively, you can download the tarball from the Broad archive.

$ ls -lh /fast/groups/cubi/work/projects/biotools/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef.tar.bz2
-rw-rw---- 1 holtgrem_c hpc-ag-cubi 14M Dec 19  2019 /fast/groups/cubi/work/projects/biotools/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef.tar.bz2

First, go to the pipeline directory where you want to run:

$ cd variant_calling

Explicitely create any missing conda environment

$ snappy-snake --conda-create-envs-only
[...]
12-27 17:18 snakemake.logging WARNING  Downloading and installing remote packages.
[...]

Find out which conda environments use GATK v3

$ grep 'gatk.*3' .snakemake/conda/*.yaml
.snakemake/conda/d76b719b718c942f8e49e55059e956a6.yaml:  - gatk =3

Activate each conda environment and register

$ for yaml in $(grep -l 'gatk.*3' .snakemake/conda/*.yaml); do
        environ=${yaml%.yaml};
        conda activate $environ
        gatk3-register /fast/groups/cubi/work/projects/biotools/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef.tar.bz2
        conda deactivate
    done
Moving GenomeAnalysisTK-3.8-1-0-gf15c1c3ef.tar.bz2 to /home/holtgrem_c/miniconda3/envs/gatk3/opt/gatk-3.8

You are now ready to run GATK v3 from this environment.

Development Notes

Here, you can find the required layout for post-PR commit messages: