02. Quick Start

pipesnake relies on Nextflow but the remaining infrastructure is packaged within Docker, Singularity, or Conda containers. Pick your poison.

1. Install `Nextflow`

version >=23.04.1
if you need more support, follow instructions at the top of the FAQ

2. Install `Docker`, `Singularity`, or `Conda`

you can follow this tutorial to help install singularity.
You can use Conda both to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see docs.

3. Download pipesnake

The below options will download the pipeline and run an example dataset in just one command:

Using docker:

nextflow run ausarg/pipesnake -profile test,docker --outdir <OUTDIR>

Using singularity:

If you are using singularity, first use nf-core download to download the singularity images for the necessary software before running the pipeline. If you don't already have nf-core (nf-core/tools) installed, you can do that easily in a variety of ways (e.g. conda, pip, etc), see here.

nf-core download ausarg/pipesnake

Once you have installed pipesnake with nf-core you can run the test.

nextflow run ausarg/pipesnake -profile test,singularity --outdir <OUTDIR>

Using conda:

We are temporarily recommending that users not use the conda implementation due to some outstanding issues. If you'd like to try anyway, instructions are below.

If you are using conda, it is highly recommended to use the NXF_CONDA_CACHEDIR or conda.cacheDir settings to store the environments in a central location for future pipeline runs. If conda envornment creation fails, consider using mamba to create the needed envornmentin the cache directory using the same hashed names reported in nextflow logs.

nextflow run ausarg/pipesnake -profile test --outdir <OUTDIR> -with-conda true

4. Prepare your input files

4.1 Generate a sample sheet (for --input):

sample_id	read1	read2	barcode1	barcode2	adaptor1	adaptor2	lineage
Sample1	/[PATH_TO]/Sample1_A_R1.fastq	/[PATH_TO]/Sample1_A_R2.fastq	AGGTTTGAGC	TACCTGGTCG	TCAC*ATCT	ACAC*ACAC	Crocodile
Sample2	/[PATH_TO]/Sample2_A_R1.fastq	/[PATH_TO]/Sample2_A_R2.fastq	CGGTGGAAGC	GTGTCTGAAG	TCAC*ATCT	ACAC*ACAC	Gecko
Sample3	/[PATH_TO]/Sample3_A_R1.fastq	/[PATH_TO]/Sample3_A_R2.fastq	TACTTACTGG	GAAATCCTAC	TCAC*ATCT	ACAC*ACAC	Snake
Sample1	/[PATH_TO]/Sample1_B_R1.fastq	/[PATH_TO]/Sample1_B_R2.fastq	TCACCGATAA	AGGCACACTC	TCAC*ATCT	ACAC*ACAC	Crocodile

The sample sheet must have the above headers, but additional columns (e.g. notes) are ok to include though will not be read. A single entry (row) corresponds to a pair of sequence read files (R1 & R2) for the same sample, but an individual sample may have multiple entries (see Sample1). read1 and read2 must indicate the absolute path to the read files. The * in adaptor sequences indicates the placement of the barcode sequence. Information about standard Illumina adaptors and trimming can be found here. Finally, the lineage designation is what you would like that sample to ultimately be called in output alignments, locus trees, and the species tree. Save your sample sheet as a comma-separated .csv file.

AusARG Datasets: use the BPA_process_metadata.py script to generate your sample sheet from within your downloaded BPA metadata directory.

4.2 Generate a targets file (for --blat_db):

The target sequence file is simply a FASTA file of your focal loci. Locus names must be unique, and ideally the target sequence data is not too divergent from your samples (though BLAT is quite flexible). An example targets file is included in SqCL_Targets.fasta, and is appropriate for use with SqCL projects.

>RAG1  
TATGTTCAAATGTCCTTGGAAAACTTCTGTCT...  
>AHE-L1  
AACTTATACAAATCTTGGATGCCATGGATCCA...
>UCE-1520
ACAGAGGTCGATATACCGTAGAAGATGTCCAG...
...

4.3 Generate a filtering file (for --filter):

The filtering sequences file is just another FASTA file of your focal loci, but from phylogenetically near samples (high similarity, e.g. intra-family). This is optional, but may be useful for speeding up the assembly step. These sequences are used as a reference to quickly (and loosely) map the raw reads against to exclude off-target sequences that would otherwise slow down the assembly. Locus names do not have to be unique and redundant targets from different taxa may improve filtration.

>RAG1  boa
TGTGTTCAAATGTCCTTGGAAAACTTCTGTCT...  
>RAG1  python
TATGTTCAAATGTCCTTGGAAAACTTCTGTCT... 
>AHE-L1  boa
ATCTTATACAAATCTTGGATGCCATGGATCCA...
>AHE-L1  python
AACTTATACAAATCTTGGATGCCATGGATCCA...  
...

5. Run your own analysis

Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (YOURPROFILE in the example command above). You can chain multiple config profiles in a comma-separated string.

The pipeline comes with config profiles called docker, singularity, podman, shifter, charliecloud and conda which instruct the pipeline to use the named tool for software management. For example, -profile test,docker.

nextflow run ausarg/pipesnake --input samplesheet.csv --outdir <OUTDIR> --blat_db <TARGET_SEQUENCES> --disable_filter false --filter <FILTER_SEQUENCES> -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

02. Quick Start

1. Install `Nextflow`

2. Install `Docker`, `Singularity`, or `Conda`

3. Download pipesnake

4. Prepare your input files

5. Run your own analysis

Clone this wiki locally

02. Quick Start

1. Install Nextflow

2. Install Docker, Singularity, or Conda

3. Download pipesnake

4. Prepare your input files

5. Run your own analysis

Clone this wiki locally

1. Install `Nextflow`

2. Install `Docker`, `Singularity`, or `Conda`