-
Notifications
You must be signed in to change notification settings - Fork 2
02. Quick Start
pipesnake relies on Nextflow
but the remaining infrastructure is packaged within Docker
, Singularity
, or Conda
containers. Pick your poison.
1. Install Nextflow
- version
>=23.04.1
- if you need more support, follow instructions at the top of the FAQ
2. Install Docker
, Singularity
, or Conda
- you can follow this tutorial to help install singularity.
- You can use
Conda
both to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see docs.
The below options will download the pipeline and run an example dataset in just one command:
- Using
docker
:
nextflow run ausarg/pipesnake -profile test,docker --outdir <OUTDIR>
- Using
singularity
:
If you are using
singularity
, first usenf-core download
to download the singularity images for the necessary software before running the pipeline. If you don't already havenf-core
(nf-core/tools
) installed, you can do that easily in a variety of ways (e.g. conda, pip, etc), see here.
nf-core download ausarg/pipesnake
Once you have installed
pipesnake
withnf-core
you can run the test.
nextflow run ausarg/pipesnake -profile test,singularity --outdir <OUTDIR>
- Using
conda
:
We are temporarily recommending that users not use the
conda
implementation due to some outstanding issues. If you'd like to try anyway, instructions are below.
If you are using
conda
, it is highly recommended to use theNXF_CONDA_CACHEDIR
orconda.cacheDir
settings to store the environments in a central location for future pipeline runs. If conda envornment creation fails, consider usingmamba
to create the needed envornmentin the cache directory using the same hashed names reported in nextflow logs.
nextflow run ausarg/pipesnake -profile test --outdir <OUTDIR> -with-conda true
4.1 Generate a sample sheet (for --input
):
sample_id | read1 | read2 | barcode1 | barcode2 | adaptor1 | adaptor2 | lineage |
---|---|---|---|---|---|---|---|
Sample1 | /[PATH_TO]/Sample1_A_R1.fastq | /[PATH_TO]/Sample1_A_R2.fastq | AGGTTTGAGC | TACCTGGTCG | TCAC*ATCT | ACAC*ACAC | Crocodile |
Sample2 | /[PATH_TO]/Sample2_A_R1.fastq | /[PATH_TO]/Sample2_A_R2.fastq | CGGTGGAAGC | GTGTCTGAAG | TCAC*ATCT | ACAC*ACAC | Gecko |
Sample3 | /[PATH_TO]/Sample3_A_R1.fastq | /[PATH_TO]/Sample3_A_R2.fastq | TACTTACTGG | GAAATCCTAC | TCAC*ATCT | ACAC*ACAC | Snake |
Sample1 | /[PATH_TO]/Sample1_B_R1.fastq | /[PATH_TO]/Sample1_B_R2.fastq | TCACCGATAA | AGGCACACTC | TCAC*ATCT | ACAC*ACAC | Crocodile |
- The sample sheet must have the above headers, but additional columns (e.g. notes) are ok to include though will not be read. A single entry (row) corresponds to a pair of sequence read files (R1 & R2) for the same sample, but an individual sample may have multiple entries (see Sample1).
read1
andread2
must indicate the absolute path to the read files. The * in adaptor sequences indicates the placement of the barcode sequence. Information about standard Illumina adaptors and trimming can be found here. Finally, thelineage
designation is what you would like that sample to ultimately be called in output alignments, locus trees, and the species tree. Save your sample sheet as a comma-separated.csv
file.
- AusARG Datasets: use the
BPA_process_metadata.py
script to generate your sample sheet from within your downloaded BPA metadata directory.
4.2 Generate a targets file (for --blat_db
):
- The target sequence file is simply a FASTA file of your focal loci. Locus names must be unique, and ideally the target sequence data is not too divergent from your samples (though BLAT is quite flexible). An example targets file is included in
SqCL_Targets.fasta
, and is appropriate for use with SqCL projects.
>RAG1
TATGTTCAAATGTCCTTGGAAAACTTCTGTCT...
>AHE-L1
AACTTATACAAATCTTGGATGCCATGGATCCA...
>UCE-1520
ACAGAGGTCGATATACCGTAGAAGATGTCCAG...
...
4.3 Generate a filtering file (for --filter
):
- The filtering sequences file is just another FASTA file of your focal loci, but from phylogenetically near samples (high similarity, e.g. intra-family). This is optional, but may be useful for speeding up the assembly step. These sequences are used as a reference to quickly (and loosely) map the raw reads against to exclude off-target sequences that would otherwise slow down the assembly. Locus names do not have to be unique and redundant targets from different taxa may improve filtration.
>RAG1 boa
TGTGTTCAAATGTCCTTGGAAAACTTCTGTCT...
>RAG1 python
TATGTTCAAATGTCCTTGGAAAACTTCTGTCT...
>AHE-L1 boa
ATCTTATACAAATCTTGGATGCCATGGATCCA...
>AHE-L1 python
AACTTATACAAATCTTGGATGCCATGGATCCA...
...
Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (YOURPROFILE
in the example command above). You can chain multiple config profiles in a comma-separated string.
- The pipeline comes with config profiles called
docker
,singularity
,podman
,shifter
,charliecloud
andconda
which instruct the pipeline to use the named tool for software management. For example,-profile test,docker
.
nextflow run ausarg/pipesnake --input samplesheet.csv --outdir <OUTDIR> --blat_db <TARGET_SEQUENCES> --disable_filter false --filter <FILTER_SEQUENCES> -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>