Parameters summary

Modes

Parameter	Required	Type	Description	Default
--gtf	T	str	Reference annotation in GTF format
-o/--output	F	str	Prefix for output index file	sqanti-sim
-d/--dir	F	str	Directory for output files	.
-k/--cores	F	int	Number of cores to run in parallel	1

In this mode, there are three sub-modes, each with both common and unique arguments.

Common arguments

Parameter	Required	Type	Description	Default
-i/--trans_index	T	str	File with transcript information generated by SQANTI-SIM (*_index.tsv)
--gtf	T	str	Complete reference annotation in GTF format
-o/--output	F	str	Prefix for output files	Same as -i
-d/--dir	F	str	Directory for output files	.
-nt/--trans_number	F	int	Total number of transcripts to simulate	30000 or same as train data
--ISM	F	int	Number of incomplete-splice-matches to simulate	0
--NIC	F	int	Number of novel-in-catalog to simulate	0
--NNC	F	int	Number of novel-not-in-catalog to simulate	0
--Fusion	F	int	Number of Fusion to simulate	0
--Antisense	F	int	Number of Antisense to simulate	0
--GG	F	int	Number of Genic-genomic to simulate	0
--GI	F	int	Number of Genic-intron to simulate	0
--Intergenic	F	int	Number of Intergenic to simulate	0
-k/--cores	F	int	Number of cores to run in parallel	1
-s/--seed	F	int	Randomizer seed	None

equal arguments

Parameter	Required	Type	Description	Default
--read_count	F	int	Number of reads to simulate	50000

custom arguments

Parameter	Required	Type	Description	Default
--nbn_known	F	float	Average read count per known transcript to simulate (the parameter 'n' of the Negative Binomial distribution)	15
--nbp_knwon	F	float	The parameter 'p' of the Negative Binomial distribution for known transcripts	0.5
--nbn_novel	F	float	Average read count per novel transcript to simulate (the parameter 'n' of the Negative Binomial distribution)	5
--nbp_novel	F	float	The parameter 'p' of the Negative Binomial distribution for novel transcripts	0.5

sample arguments

Parameter	Required	Type	Description	Default
--genome	T	str	Reference genome FASTA
--expr_file/--long_reads	F	str	PacBio or ONT reads for quantification in FASTA, FASTQ or aligned SAM format	pre-trained data
--pb/--ont	F		To use PacBio or ONT simulation settings
--iso_complex	F		If used the program will approximate the expressed isoform complexity (number of isoforms per gene)
--diff_exp	F		Factor for adjusting the odds of novel and known transcripts expression assignments. A value of 0 means no bias between the two types. A higher value increases the bias towards novel transcripts having lower expression	2
--read_type	F	str	Read type for ONT expression level (if --ont). Choose between "cDNA" or "dRNA"	cDNA

Parameter	Required	Type	Description	Default
-i/--trans_index	T	str	File with transcript information generated with SQANTI-SIM (*_index.tsv)
--gtf	T	str	Complete reference annotation in GTF format
--genome	T	str	Reference genome FASTA
--pb/--ont	T		Choose to simulate ONT or PacBio reads
--pbsim/--isoseqsim	F		If using --pb choose between PBSIM3 and IsoSeqSim simulator	pbsim
--read_type	F	str	Read type for NanoSim simulation. Choose between "cDNA" or "dRNA"	cDNA
--illumina	F		If used the program will simulate Illumina reads with Polyester
--CAGE	F		If used the program will simulate a sample-specific CAGE peak BED file and automatically simulate short-reads as well
--long_count	F	int	Number of long reads to simulate (if not given it will use the requested_counts from the --trans_index file)
--short_count	F	int	Number of short reads to simulate (if not given it will use the requested_counts from the --trans_index file)
--nanosim_model	F	str	Directory of the pre-trained NanoSim model
--pbsim_model	F	str	PBSIM3 quality score pre-trained model
--isoseqsim_model	F	str	One-line tab-separated file with substitution, deletion and insertion error
--CAGE_model	F	str	Directory of the pre-trained CAGE model
--falseCAGE_prop	F	float	Proportion (0, 1) of simulated CAGE peaks that are not derived from actual TSS locations	0.2
-d/--dir	F	str	Directory for output files	.
-k/--cores	F	int	Number of cores to run in parallel	1
-s/--seed	F	int	Randomizer seed	None

Parameter	Required	Type	Description	Default
--transcriptome	T	str	Long-read-defined trancriptome reconstructed with your pipeline in GTF, FASTA or FASTQ format
-i/--trans_index	T	str	File with transcript information generated with SQANTI-SIM (*_index.tsv)
--gtf	T	str	Reduced reference annotation in GTF format
--genome	T	str	Reference genome FASTA
-o/--output	F	str	Prefix for output index file	sqanti-sim
-d/--dir	F	str	Directory for output files	.
-e/--expression	F	str	Expression of transcript models (file without header with two columns tab-separated: first with id and second with quantified number of reads, no header)	None
-c/--coverage	F	str	Junction coverage files (provide a single file, comma-delmited filenames, or a file pattern, ex: "mydir/*.junctions")	None
--SR_bam	F	str	Directory or fofn file with the sorted bam files of Short Reads RNA-Seq mapped against the genome	None
--short_reads	F	str	File Of File Names (fofn, space separated) with paths to FASTA or FASTQ from Short-Read RNA-Seq	None
--CAGE_peak	F	str	CAGE Peak file in BED format (example FANTOM5)	None
--fasta	F		Use when running SQANTI-SIM by using as input a FASTA/FASTQ with the sequences of isoforms
--aligner_choice	F	str	If --fasta used, choose the aligner to map your isoforms (minimap2, deSALT, gmap, uLTRA)	minimap2
--min_support	F	int	Minimum number of supporting reads for an isoform	3
-k/--cores	F	int	Number of cores to run in parallel	1