Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated --help #495

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Updated --help #495

wants to merge 1 commit into from

Conversation

ACEnglish
Copy link

Hello,

I found the sniffles --help to be difficult to read. I've refactored it so that text is less wide and I've separated --help and --example (detailed below). This is technically a breaking change in that parameters which used the tobool as a type were all replaced with action="store_false". Therefore, any pipeline which has hard coded calls to e.g. sniffles --qc-stdev false would need to be updated to sniffles --qc-stdev

Note that the formatting in the below examples is a different from what would be seen in a terminal due to github applying formatting.

default `sniffles` output

usage: sniffles --input SORTED_INPUT.bam [--vcf OUTPUT.vcf] [--snf MERGEABLE_OUTPUT.snf] [--threads 4] [--mosaic]

Sniffles2: A fast structural variant (SV) caller for long-read sequencing data
Version 2.4
Contact: [email protected]

Use --help for full parameter information
Use --example for detailed usage information
sniffles: error: the following arguments are required: -i/--input

`sniffles --help` output

usage: sniffles --input SORTED_INPUT.bam [--vcf OUTPUT.vcf] [--snf MERGEABLE_OUTPUT.snf] [--threads 4] [--mosaic]

Sniffles2: A fast structural variant (SV) caller for long-read sequencing data
Version 2.4
Contact: [email protected]

Use --help for full parameter information
Use --example for detailed usage information

options:
-h, --help show this help message and exit
--example Show example usage and exit
--version show program's version number and exit

Common parameters:
-i IN [IN ...], --input IN [IN ...]
For single-sample calling: A coordinate-sorted and indexed .bam/.cram
(BAM/CRAM format) file containing aligned reads. - OR - For multi-sample
calling: Multiple .snf files (generated before by running Sniffles2 for
individual samples with --snf)
-v OUT.vcf, --vcf OUT.vcf
VCF output filename to write the called and refined SVs to. If the given
filename ends with .gz, the VCF file will be automatically bgzipped and a
.tbi index built for it.
--snf OUT.snf Sniffles2 file (.snf) output filename to store candidates for later multi-
sample calling
--reference REF.fa (Optional) Reference sequence the reads were aligned against. To enable
output of deletion SV sequences, this parameter must be set.
--tandem-repeats IN.bed
(Optional) Input .bed file containing tandem repeat annotations for the
reference genome.
--regions REG.bed (Optional) Only process the specified regions.
-c, --contig (Optional) Only process the specified contigs. May be given more than once.
--phase Determine phase for SV calls (requires the input alignments to be phased)
-t, --threads Number of parallel threads to use (4)

SV Filtering parameters:
--minsupport Min number of supporting reads for a SV to be reported (auto)
--minsupport-auto-mult
Coverage based auto-minsupport multiplier for germline mode (0.1/0.025)
--minsvlen Min SV length in bp (50)
--minsvlen-screen-ratio
Min length for SV candidates as fraction of --minsvlen (0.9)
--mapq Alignments with mapping quality lower than this value will be ignored
--no-qc, --qc-output-all
Output all SV candidates, disregarding quality control steps
--qc-stdev Apply filtering based on SV start position and length standard deviation
--qc-stdev-abs-max Max standard deviation for SV length and size in bp (500)
--qc-strand Apply filtering based on strand support of SV calls
--qc-coverage Min surrounding region coverage of SV calls (1)
--long-ins-length Insertion SVs longer than this are subjected to more sensitive filtering
(2500)
--long-del-length Deletion SVs longer than this are subjected to central coverage drop-based
filtering. Not applicable for --mosaic (50000)
--long-inv-length Inversion SVs longer than this value are not subjected to central coverage
drop-based filtering (10000)
--long-del-coverage Long deletions with central coverage higher than this value will be
filtered. Not applicable for --mosaic (0.66)
--long-dup-length Duplication SVs longer than this value are subjected to central coverage
increase-based filtering. Not applicable for --mosaic (50000)
--qc-bnd-filter-strand
Filter breakends that do not have support for both strands
--bnd-min-split-length
Min length of read splits to be considered for breakends (1000)
--long-dup-coverage Long duplications with central coverage lower than this value will be
filtered. Not applicable for --mosaic (1.33)
--max-splits-kb Additional number of splits per kilobase read sequence allowed before reads
are ignored (0.1)
--max-splits-base N Base number of splits allowed before reads are ignored (3)
--min-alignment-length
Reads with alignments shorter than this length in bp will be ignored
--phase-conflict-threshold
Max fraction of conflicting reads permitted for SV phase information to be
labelled as PASS. Only for --phase (0.1)
--detect-large-ins Infer insertions that are longer than most reads and therefore are spanned
by few alignments only.

SV Clustering parameters:
--cluster-binsize Initial screening bin size in bp (100)
--cluster-r Multiplier for SV start position standard deviation criterion in cluster
merging (2.5)
--cluster-repeat-h Multiplier for mean SV length criterion for tandem repeat cluster merging
(1.5)
--cluster-repeat-h-max
Max. merging distance based on SV length criterion for tandem repeat cluster
merging (1000)
--cluster-merge-pos Max. merging distance for insertions and deletions on the same read and
cluster in non-repeat regions (150)
--cluster-merge-len Max. size difference for merging SVs as fraction of SV length (0.33)
--cluster-merge-bnd Max. merging distance for breakend SV candidates (1000)

SV Genotyping parameters:
--genotype-ploidy Sample ploidy (2)
--genotype-error Estimated false positive rate for leads (0.05)
--sample-id Custom ID for this sample (SAMPLE))
--genotype-vcf IN.vcf
Forced calling input.vcf

Multi-Sample Calling / Combine parameters:
--combine-high-confidence
Min fraction of passed QC samples an SV needs (0.0)
--combine-low-confidence
Min fraction of present samples an SV needs (0.2)
--combine-low-confidence-abs
Min number of present samples an SV needs (2)
--combine-null-min-coverage
Min coverage for a genotype to be reported as 0/0 instead of ./. (5)
--combine-match Multiplier for maximum deviation of multiple SV's start/end position for
them to be combined across samples. Given by
max_dev=M*sqrt(min(SV_length_a,SV_length_b)), where M is this parameter
(250)
--combine-match-max Upper limit for the max deviation computed for --combine-match, in bp (1000)
--combine-separate-intra
Disable combination of SVs within the same sample
--combine-output-filtered
Include low-confidence / mosaic SVs in multi-calling
--combine-pair-relabel
Override low-quality genotypes when combining paired samples
--combine-pair-relabel-threshold
Genotype quality minimum before relabeling (20)
--combine-close-handles
Close .SNF file handles after each use to avoid opened files ulimit when
merging many samples.
--combine-pctseq Min alignment distance as percent of SV length to be merged. 0=off (0.7)

Output formatting parameters:
--output-rnames Output names supporting reads in INFO/RNAME
--no-consensus Disable consensus sequence generation for insertion SV calls
--no-sort Do not sort output VCF
--no-progress Disable progress display
--quiet Disable any non-error logging
--max-del-seq-len Max deletion sequence length in output before writing as symbolic <DEL>
(50000)
--symbolic Output all SVs as symbolic
--allow-overwrite Allow overwriting existing output files

Mosaic/somatic calling mode parameters:
--mosaic Turn on mosaic calling
--mosaic-af-max Max allele frequency for which SVs are considered mosaic (0.2)
--mosaic-af-min Min allele frequency for mosaic SVs to be output (0.05)
--mosaic-qc-invdup-min-length
Min SV length for mosaic inversion and duplication SVs (500)
--mosaic-qc-coverage-max-change-frac
Max relative coverage change across breakpoints (0.1)
--mosaic-qc-strand Apply filtering based on strand support of calls
--mosaic-include-germline
Report germline SVs as well in mosaic mode

Developer parameters:
--combine-consensus Output the consensus genotype of all samples
--qc-coverage-max-change-frac F
Max relative coverage change across SV breakpoints

`sniffles --example` output

sniffles example commands:

Call SVs for a single sample
-> sniffles --input sorted_indexed_alignments.bam --vcf output.vcf

... OR, with CRAM input and bgzipped+tabix indexed VCF output:
-> sniffles --input sample.cram --vcf output.vcf.gz

... OR, producing only a SNF file with SV candidates:
-> sniffles --input sample1.bam --snf sample1.snf

... OR, simultaneously produce a single-sample VCF and SNF file:
-> sniffles --input sample1.bam --vcf sample1.vcf.gz --snf sample1.snf

... OR, with tandem repeat annotations, reference (for DEL sequences) and mosaic mode for detecting rare SVs:
-> sniffles --input sample1.bam --vcf sample1.vcf.gz --tandem-repeats tandem_repeats.bed --reference genome.fa --mosaic

Multi-sample calling
Step 1. Create .snf for each sample:
-> sniffles --input sample1.bam --snf sample1.snf
Step 2. Combined calling:
-> sniffles --input sample1.snf sample2.snf ... sampleN.snf --vcf multisample.vcf

... OR, using a .tsv file containing a list of .snf files and sample ids (one sample per line):
Step 2. Combined calling:
-> sniffles --input snf_files_list.tsv --vcf multisample.vcf

Determine genotypes for a set of known SVs (force calling)
-> sniffles --input sample.bam --genotype-vcf input_known_svs.vcf --vcf output_genotypes.vcf

Less wide text and separating --help and --example
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant