forked from sortmerna/sortmerna
-
Notifications
You must be signed in to change notification settings - Fork 0
/
sortmerna.1
188 lines (187 loc) · 5.91 KB
/
sortmerna.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.1.
.TH SORTMERNA "1" "August 2015" "sortmerna 2.1" "User Commands"
.SH NAME
sortmerna \- tool for filtering, mapping and OTU-picking NGS reads
.SH SYNOPSIS
.B sortmerna
\fB\-\-ref\fR db.fasta,db.idx \fB\-\-reads\fR file.fa \fB\-\-aligned\fR base_name_output [OPTIONS]
.SH DESCRIPTION
.P
SortMeRNA is a biological sequence analysis tool for filtering, mapping and
OTU-picking NGS reads. The core algorithm is based on approximate seeds and
allows for fast and sensitive analyses of nucleotide sequences. The main
application of SortMeRNA is filtering rRNA from metatranscriptomic data.
Additional applications include OTU-picking and taxonomy assignation available
through QIIME v1.9+ (http://qiime.org - v1.9.0-rc1).
.P
SortMeRNA takes as input a file of reads (fasta or fastq format) and one or
multiple rRNA database file(s), and sorts apart rRNA and rejected reads into
two files specified by the user. Optionally, it can provide high quality local
alignments of rRNA reads against the rRNA database. SortMeRNA works with
Illumina, 454, Ion Torrent and PacBio data, and can produce SAM and
BLAST-like alignments.
.SH OPTIONS
.SS MANDATORY OPTIONS
.TP
\fB\-\-ref\fR \fISTRING,STRING\fR
FASTA reference file, index file
.br
Example:
.br
\fB\-\-ref\fR \fI\,/path/to/file1.fasta\/,/path/to/index1\fP
.br
If passing multiple reference sequence files, separate them by ':'
.br
Example:
.br
\fB\-\-ref\fR \fI/path/f1.fasta,/path/index1:/path/f2.fasta,path/index2\fP
.TP
\fB\-\-reads\fR \fISTRING\fR
FASTA/FASTQ reads file
.TP
\fB\-\-aligned\fR \fISTRING\fR
aligned reads filepath + base file name
(appropriate extension will be added)
.SS COMMON OPTIONS
.TP
\fB\-\-other\fR \fISTRING\fR
rejected reads filepath + base file name
(appropriate extension will be added)
.TP
\fB\-\-fastx\fR \fIBOOL\fR
output FASTA/FASTQ fil (default: off,
for aligned and/or rejected reads)
.TP
\fB\-\-sam\fR \fIBOOL\fR
output SAM alignmen (default: off,
for aligned reads only)
.TP
\fB\-\-SQ\fR \fIBOOL\fR
add SQ tags to the SAM fil (default: off)
.TP
\fB\-\-blast\fR \fISTRING\fR
output alignments in various Blast\-like formats
.br
'0' \- pairwise
.br
'1' \- tabular (Blast \fB\-m\fR 8 format)
.br
'1 cigar' \- tabular + column for CIGAR
.br
'1 cigar qcov' \- tabular + columns for CIGAR and query coverage
.br
'1 cigar qcov qstrand' \- tabular + columns for CIGAR, query coverage and strand
.TP
\fB\-\-log\fR \fIBOOL\fR
output overall statistic (default: off)
.TP
\fB\-\-num_alignments\fR \fIINT\fR
report first INT alignments per read reaching E\-value (default: -1,
\fB\-\-num_alignments\fR 0 signifies all alignments will be output)
.TP
\fIor\fR (default)
.TP
\fB\-\-best\fR \fIINT\fR
report INT best alignments per read reaching E\-value (default: 1)
by searching \fB\-\-min_lis\fR \fIINT\fR candidate alignments
(\fB\-\-best\fR 0 signifies all candidate alignments will be searched)
.TP
\fB\-\-min_lis\fR \fIINT\fR
search all alignments having the first INT longest LIS (default: 2)
LIS stands for Longest Increasing Subsequence, it is
computed using seeds' positions to expand hits into
longer matches prior to Smith\-Waterman alignment.
.TP
\fB\-\-print_all_reads\fR
output null alignment strings for non\-aligned reads (default: off)
to SAM and/or BLAST tabular files
.TP
\fB\-\-paired_in\fR \fIBOOL\fR
both paired\-end reads go in \fB\-\-aligned\fR fasta/q file (default: off,
interleaved reads only, see Section 4.2.4 of User Manual)
.TP
\fB\-\-paired_out\fR \fIBOOL\fR
both paired\-end reads go in \fB\-\-other\fR fasta/q file (default: off,
interleaved reads only, see Section 4.2.4 of User Manual)
.TP
\fB\-\-match\fR \fIINT\fR
SW score (positive integer) for a match (default: 2)
.TP
\fB\-\-mismatch\fR \fIINT\fR
SW penalty (negative integer) for a mismatch (default: -3)
.TP
\fB\-\-gap_open\fR \fIINT\fR
SW penalty (positive integer) for introducing a gap (default: 5)
.TP
\fB\-\-gap_ext\fR \fIINT\fR
SW penalty (positive integer) for extending a gap (default: 2)
.TP
\fB\-N\fR \fIINT\fR
SW penalty for ambiguous letters (N's) (default: scored as \fB\-\-mismatch\fR)
.TP
\fB\-F\fR \fIBOOL\fR
search only the forward strand (default: off)
.TP
\fB\-R\fR \fIBOOL\fR
search only the reverse\-complementary strand (default: off)
.TP
\fB\-a\fR \fIINT\fR
number of threads to use (default: 1)
.TP
\fB\-e\fR \fIDOUBLE\fR
E\-value threshold (default: 1)
.TP
\fB\-m\fR \fIINT\fR
INT Mbytes for loading the reads into memory (default: 1024,
maximum \fB\-m\fR INT is 5872)
.TP
\fB\-v\fR \fIBOOL\fR
verbose (default: off)
.SS OTU PICKING OPTIONS
.TP
\fB\-\-id\fR \fIDOUBLE\fR
%id similarity threshold (the alignment must
still pass the E\-value threshold, default: 0.97)
.TP
\fB\-\-coverage\fR \fIDOUBLE\fR
%query coverage threshold (the alignment must
still pass the E\-value threshold, default: 0.97)
.TP
\fB\-\-de_novo_otu\fR \fIBOOL\fR
FASTA/FASTQ file for reads matching database < %id
.br
(set using \fB\-\-id\fR) and < %cov (set using \fB\-\-coverage\fR)
.br
(alignment must still pass the E\-value threshold, default: off)
.TP
\fB\-\-otu_map\fR \fIBOOL\fR
output OTU map (input to QIIME's make_otu_table.py, default: off)
.SS ADVANCED OPTIONS
.P
see SortMeRNA user manual for more details
.TP
\fB\-\-passes\fR \fIINT\fR
three intervals at which to place the seed on the read
(L is the seed length set in indexdb_rna(1), default: L,L/2,3)
.TP
\fB\-\-edges\fR \fIINT\fR
number (or percent if INT followed by % sign) of
nucleotides to add to each edge of the read
prior to SW local alignment (default: 4)
.TP
\fB\-\-num_seeds\fR \fIINT\fR
number of seeds matched before searching for candidate LIS (default: 2)
.TP
\fB\-\-full_search\fR \fIBOOL\fR
search for all 0\-error and 1\-error seed matches in the index rather than stopping
after finding a 0\-error match (<1% gain in
sensitivity with up four\-fold decrease in speed, default: off)
.TP
\fB\-\-pid\fR \fIBOOL\fR
add pid to output file names (default: off)
.TP
\fB\-h\fR \fIBOOL\fR
help
.TP
\fB\-\-version\fR \fIBOOL\fR
SortMeRNA version number