Skip to content

Parameters summary

jmestret edited this page Sep 13, 2023 · 2 revisions

Modes

classif

Parameter Required Type Description Default
--gtf T str Reference annotation in GTF format
-o/--output F str Prefix for output index file sqanti-sim
-d/--dir F str Directory for output files .
-k/--cores F int Number of cores to run in parallel 1

design

In this mode, there are three sub-modes, each with both common and unique arguments.

Common arguments

Parameter Required Type Description Default
-i/--trans_index T str File with transcript information generated by SQANTI-SIM (*_index.tsv)
--gtf T str Complete reference annotation in GTF format
-o/--output F str Prefix for output files Same as -i
-d/--dir F str Directory for output files .
-nt/--trans_number F int Total number of transcripts to simulate 30000 or same as train data
--ISM F int Number of incomplete-splice-matches to simulate 0
--NIC F int Number of novel-in-catalog to simulate 0
--NNC F int Number of novel-not-in-catalog to simulate 0
--Fusion F int Number of Fusion to simulate 0
--Antisense F int Number of Antisense to simulate 0
--GG F int Number of Genic-genomic to simulate 0
--GI F int Number of Genic-intron to simulate 0
--Intergenic F int Number of Intergenic to simulate 0
-k/--cores F int Number of cores to run in parallel 1
-s/--seed F int Randomizer seed None

equal arguments

Parameter Required Type Description Default
--read_count F int Number of reads to simulate 50000

custom arguments

Parameter Required Type Description Default
--nbn_known F float Average read count per known transcript to simulate (the parameter 'n' of the Negative Binomial distribution) 15
--nbp_knwon F float The parameter 'p' of the Negative Binomial distribution for known transcripts 0.5
--nbn_novel F float Average read count per novel transcript to simulate (the parameter 'n' of the Negative Binomial distribution) 5
--nbp_novel F float The parameter 'p' of the Negative Binomial distribution for novel transcripts 0.5

sample arguments

Parameter Required Type Description Default
--genome T str Reference genome FASTA
--expr_file/--long_reads F str PacBio or ONT reads for quantification in FASTA, FASTQ or aligned SAM format pre-trained data
--pb/--ont F To use PacBio or ONT simulation settings
--iso_complex F If used the program will approximate the expressed isoform complexity (number of isoforms per gene)
--diff_exp F Factor for adjusting the odds of novel and known transcripts expression assignments. A value of 0 means no bias between the two types. A higher value increases the bias towards novel transcripts having lower expression 2
--read_type F str Read type for ONT expression level (if --ont). Choose between "cDNA" or "dRNA" cDNA

sim

Parameter Required Type Description Default
-i/--trans_index T str File with transcript information generated with SQANTI-SIM (*_index.tsv)
--gtf T str Complete reference annotation in GTF format
--genome T str Reference genome FASTA
--pb/--ont T Choose to simulate ONT or PacBio reads
--pbsim/--isoseqsim F If using --pb choose between PBSIM3 and IsoSeqSim simulator pbsim
--read_type F str Read type for NanoSim simulation. Choose between "cDNA" or "dRNA" cDNA
--illumina F If used the program will simulate Illumina reads with Polyester
--CAGE F If used the program will simulate a sample-specific CAGE peak BED file and automatically simulate short-reads as well
--long_count F int Number of long reads to simulate (if not given it will use the requested_counts from the --trans_index file)
--short_count F int Number of short reads to simulate (if not given it will use the requested_counts from the --trans_index file)
--nanosim_model F str Directory of the pre-trained NanoSim model
--pbsim_model F str PBSIM3 quality score pre-trained model
--isoseqsim_model F str One-line tab-separated file with substitution, deletion and insertion error
--CAGE_model F str Directory of the pre-trained CAGE model
--falseCAGE_prop F float Proportion (0, 1) of simulated CAGE peaks that are not derived from actual TSS locations 0.2
-d/--dir F str Directory for output files .
-k/--cores F int Number of cores to run in parallel 1
-s/--seed F int Randomizer seed None

eval

Parameter Required Type Description Default
--transcriptome T str Long-read-defined trancriptome reconstructed with your pipeline in GTF, FASTA or FASTQ format
-i/--trans_index T str File with transcript information generated with SQANTI-SIM (*_index.tsv)
--gtf T str Reduced reference annotation in GTF format
--genome T str Reference genome FASTA
-o/--output F str Prefix for output index file sqanti-sim
-d/--dir F str Directory for output files .
-e/--expression F str Expression of transcript models (file without header with two columns tab-separated: first with id and second with quantified number of reads, no header) None
-c/--coverage F str Junction coverage files (provide a single file, comma-delmited filenames, or a file pattern, ex: "mydir/*.junctions") None
--SR_bam F str Directory or fofn file with the sorted bam files of Short Reads RNA-Seq mapped against the genome None
--short_reads F str File Of File Names (fofn, space separated) with paths to FASTA or FASTQ from Short-Read RNA-Seq None
--CAGE_peak F str CAGE Peak file in BED format (example FANTOM5) None
--fasta F Use when running SQANTI-SIM by using as input a FASTA/FASTQ with the sequences of isoforms
--aligner_choice F str If --fasta used, choose the aligner to map your isoforms (minimap2, deSALT, gmap, uLTRA) minimap2
--min_support F int Minimum number of supporting reads for an isoform 3
-k/--cores F int Number of cores to run in parallel 1