TELSVirus

A snakemake workflow for viral strain detection.

Requirements

All dependancies are managed through conda environments included in the repository.

The workflow is currently built around basecalled Nanopore sequencing output. This does not mean it cannot work for PacBio sequencing data, but it has not been tested on this.

Install Snakemake and Clone the Repository

Create the environment for telsvirus using conda:

conda create -c conda-forge -c bioconda -c anaconda -n telsvirus snakemake git git-lfs

Clone the repository:

conda activate telsvirus

git clone https://github.com/jonathan-bravo/TELSVirus.git

Update Config

Instructions on updating the configuration can be found here.

Usage on Local Desktop or Interactive HPC Run

Make sure to update the core value local or hpc profiles located at workflows/profiles/local/config.yaml or workflows/profiles/hpc/config.yaml if a different number of CPU cores is available on your system.

Profile	Profile Variable	Default Value
`local`	`cores`	6
`hpc`	`cores`	120

Running the workflow locally:

cd TELSVirus

snakemake --profile worflow/profiles/local

Running the workflow on an HPC interactively:

cd TELSVirus

snakemake --profile worflow/profiles/hpc

Usage on Slurm Cluster

Make sure to update the email, account, and qos values in the slurm profile located at worflow/profiles/slurm/config.yaml

default-resources:
  - mem_mb=32000
  - account=
  - qos=
  - email=
  - mail_type="NONE"

Make sure all string values are surrounded by double quotes ("").

Move the run.sh from the resources directory up one level:

mv resources/run.sh .

Make sure to edit the email and time if necessery for your run. (I believe the email is necessary for batch runs.)

#SBATCH --mail-user=<email>
#SBATCH --time=24:00:00

Launching a SLURM job for the workflow:

cd TELSVirus

# Run the workflow
sbatch run.sh

Test Data

A negative and positive sample are included in resources/test/reads/.

NOTE: git-lfs is a requirement for the test data to work. Without it, the FASTA and FASTQ files come through as git-lfs parts and will cause the workflow to error out.

Output

Name	Content
`on_target_stats.tsv`	A file that containes a row with the number of input reads, number of reads mapped to host, number of reads mapped to viral database, number of unmapped reads, the host reads percent, and the on-target percent for each sample.
`{sample}_add_sample_info.done`	A flag file ensureing run id and sample id are added to all metric files.
`{sample}_chimeric_count.txt`	A file that contains a single count of reads that were split as chimeras during trimming.
`{sample}_dedup.fastq.gz`	The deduplicated input reads.
`{sample}_dup_reads.fastq.gz`	The reads removed during deduplication.
`{sample}_duplicates.txt`	The ids of reads considered duplicates.
`{sample}_find_duplcates.done`	A flag file ensuring deduplication is finished.
`{sample}_hard_trim_count.txt`	Reads that were removed from analysis for being too short.
`{sample}_non_host.fastq.gz`	Deduplicated and host removed reads.
`{sample}_post_dedup_rl.tsv`	A file that contains read lengths before deduplication.
`{sample}_pre_dedup_rl.tsv`	A file that contains read lengths after deduplication.
`{sample}_reads_per_strain_filtered.tsv`	The number of reads that aligned to each viral strain in the `viral_genomes` filtered to only those with $>0$ reads.
`{sample}_reads_per_strain.tsv`	The number of reads that aligned to each viral strain in the `viral_genomes`.
`{sample}_selected_viral_targets.log`	A file that contains the selected viral strains from `viral_genomes`. A strain is selected if it has a horizontal coverage of $\ge 80%$. If there are multiple viral accessions with the same strain then the highest horizontal coverage is chosen. If the horizontal coverage is the same then the accession with the highest mean depth is chosen.
`{sample}_start_read_count.txt`	A file that contains a single count of reads before any processing.
`{sample}_stats_viruses_sorted_sftclp_REMOVED.bam`	Alignments that were removed from the `{sample}_stats_viruses_sorted_sftclp.bam` for failing the soft-clip check.
`{sample}_trimmed.fastq.gz`	The reads after trimming.
`{sample}_trimmed.log`	A log file containing the sequences trimmed from each read, the number of bases trimmed from each end, and the full sequence if it was too short and removed from further processing.
`{sample}_viral_target_genomes.fasta`	A FASTA file contining all viral sequences that are found in the `{sample}_selected_viral_targets.log` file.
`{sample}_VIRAL_TARGETS_FOUND` OR `{sample}_NO_VIRAL_TARGETS`	A flag file indicating in viral targets are found. Originally used for further processing; currently just for information.
`{sample}_viral_targets.log`	All viral targets before applying filtering.
`{sample}_viruses_sorted_sftclp_REMOVED.bam`	Alignments that were removed from the `{sample}_viruses_sorted_sftclp.bam` for failing the soft-clip check.
`{sample}_viruses_sorted_sftclp.bam`	The alignment files used for determing all viral targets.
`{sample}_viruses_sorted_sftclp.bam.bai`	The index file of `{sample}_viruses_sorted_sftclp.bam`.
`{sample}.mpileup`	Pileup file generated for determining viral targets horizontal coverage and mean depth.

Making a Workflow DAG

snakemake --forceall --rulegraph | dot -Tsvg > dag.svg

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
.github/workflows		.github/workflows
config		config
resources		resources
workflow		workflow
.gitattributes		.gitattributes
.gitignore		.gitignore
.snakemake-workflow-catalog.yaml		.snakemake-workflow-catalog.yaml
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TELSVirus

Requirements

Install Snakemake and Clone the Repository

Update Config

Usage on Local Desktop or Interactive HPC Run

Usage on Slurm Cluster

Test Data

Output

Making a Workflow DAG

Workflow DAG Image

About

Releases

Packages

Languages

License

jonathan-bravo/TELSVirus

Folders and files

Latest commit

History

Repository files navigation

TELSVirus

Requirements

Install Snakemake and Clone the Repository

Update Config

Usage on Local Desktop or Interactive HPC Run

Usage on Slurm Cluster

Test Data

Output

Making a Workflow DAG

Workflow DAG Image

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages