Skip to content

SortMeRNA: a sequence analysis tool for filtering, mapping and clustering NGS reads.

License

GPL-3.0, GPL-3.0 licenses found

Licenses found

GPL-3.0
LICENSE.txt
GPL-3.0
COPYING
Notifications You must be signed in to change notification settings

emrobe/sortmerna

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sortmerna

Build Status

SortMeRNA is a local sequence alignment tool for filtering, mapping and clustering.

The core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads. The main application of SortMeRNA is filtering rRNA from metatranscriptomic data. SortMeRNA takes as input a file of reads (fasta or fastq format) and one or multiple rRNA database file(s), and sorts apart aligned and rejected reads into two files specified by the user. Additional applications include clustering and taxonomy assignation available through QIIME v1.9.1 (http://qiime.org). SortMeRNA works with Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.

Visit http://bioinfo.lifl.fr/RNA/sortmerna/ for more information.

Table of Contents

Support

For questions and comments, please use the SortMeRNA forum.

Documentation

If you have Doxygen installed, you can generate the documentation by modifying the following lines in doxygen_configure.txt:

INPUT = /path/to/sortmerna/include /path/to/sortmerna/src
IMAGE_PATH = /path/to/sortmerna/algorithm

and running the following command:

doxygen doxygen_configure.txt

This command will generate a folder html in the directory from which the command was run.

Getting Started

SortMeRNA can be built and run on Windows, Linux, and Mac.

There are 3 methods to install SortMeRNA:

  1. GitHub repository development version (master branch) ...* Installation instructions
  2. GitHub releases (tar balls, zip) ...* Installation instructions Linux ...* Installation instructions Mac OS ...* Installation instructions Windows OS
  3. BioInfo releases (tar balls including compiled binaries)

Option (3) is the simplest, as it provides access to pre-compiled binaries to various OS.

SortMeRNA Compilation

CMake is used for build files generation and should be installed prior the build. CMake distributions are available for all major operating systems. Please visit CMake project website for download and installation instructions.

Linux OS

(1) Check your GCC compiler is version 4.0 or above:

gcc --version

(2) Generate the build files:

mkdir -p $SMR_HOME/build/Release
pushd $SMR_HOME/build/Release
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release ../..

$SMR_HOME is the top directory where sortmerna code (e.g. git repo) is located.

The above commands will perform necessary system check-ups, dependencies, and generate Makefile.

(3) Compile and build executables:

make

The binaries are created in $SMR_HOME/build/Release/src/indexdb and $SMR_HOME/build/Release/src/sortmerna Simply add the build binaries to the PATH e.g. export PATH="$SMR_HOME/build/Release/src/indexdb:$SMR_HOME/build/Release/src/sortmerna:$PATH"

Mac OS

(1) Perform the same steps as described above for Linux.

Note: If the compiler is Clang, you will not have access to multithreading.

(2) If the compiler is LLVM-GCC, you will need to change it (see Deprecation and Removal Notice).

To set your compiler to Clang (see instructions) or the original GCC compiler (see instructions).

Set Clang compiler for Mac OS

(1) Check if you have Clang installed:

clang --version

(2a) If Clang is installed, set your compiler to Clang:

export CC=clang
export CXX=clang++

(2b) If Clang is not installed, see Clang for Mac OS for installation instructions.

Set GCC compiler for Mac OS

(1) Check if you have GCC installed:

gcc --version

(2a) If GCC is installed, set your compiler to GCC:

export CC=gcc-mp-4.8
export CXX=g++-mp-4.8

(2b) If GCC is not installed, see Install GCC through MacPorts for installation instructions.

(3) Next, if you would like zlib support (reading compressed .zip and .gz FASTA/FASTQ files), Zlib should also be installed via MacPorts. See section Install GCC and Zlib though MacPorts for installation instructions.

(4a) Assuming you have Zlib installed, run configure and make scripts (if compression feature wanted):

./configure --with-zlib="/opt/local"
make

(4b) Otherwise (if option to read compressed files is not wanted):

./configure --without-zlib
make

You can define an alternative installation directory by specifying --prefix=/path/to/installation/dir to configure.

Install compilers, ZLIB and autoconf

NOTE: the Clang compiler on Mac (distributed through Xcode) does not support OpenMP (multithreading). A preliminary implementation of OpenMP for Clang has been made at "http://clang-omp.github.io" though has not been yet incorporated into the Clang mainline. The user may follow the steps outlined in the above link to install the version of Clang with multithreading support, though this version has not yet been tested with SortMeRNA. Otherwise, the user is recommended to install the original GCC compiler via MacPorts (contains full multithreading support).

Clang for Mac OS

Installing Xcode (free through the App Store) and Xcode command line tools will automatically install the latest version of Clang supported with Xcode.

After installing Xcode, the Xcode command line tools may be installed via:

Xcode -> Preferences -> Downloads

Under "Components", click to install "Command Line Tools"

GCC and Zlib though MacPorts

Assuming you have MacPorts installed, type:

sudo port selfupdate
sudo port install gcc48
sudo port install zlib

After the installation, you should find the compiler installed in /opt/local/bin/gcc-mp-4.8 and /opt/local/bin/g++-mp-4.8 as well as Zlib in /opt/local/lib/libz.dylib and /opt/local/include/zlib.h .

Windows OS

MS Visual Studio Community edition and CMake for Windows are required for building SortMeRNA. Download and Install VS Community edition from Visual Studio community website The following assumes Visual Studio 14 2015.

Open Win CMD (command shell)

mkdir %SMR_HOME%\build
pushd %SMR_HOME%\build
cmake -G "Visual Studio 14 2015 Win64" ..

The above generates VS project files in %SMR_HOME%\build\ directory. It also downloads required 3rd party source packages like zlib (in %SMR_HOME%\3rdparty\). %SMR_HOME% is the top directory where SortMeRNA source distribution (e.g. Git repo) is installed.

Start Visual Studio and open Sortmerna solution File -> Open -> Project/Solution .. open %SMR_HOME%\build\sortmerna.sln

Select desired build type: Release | Debug | RelWithDebInfo | MinSizeRel. In Solution explorer right-click ALL_BUILD' and select build` in pop-up menu.

Depending on the build type the binaries are generated in %SMR_HOME%\build\src\sortmerna\Release (or Debug | RelWithDebInfo | MinSizeRel).

Add sortmerna executables to PATH

set PATH=%SMR_HOME%\build\src\indexdb\Release;%SMR_HOME%\build\src\sortmerna\Release;%PATH%

Tests

Python code is provided for running tests in $SRM_HOME/tests (%SRM_HOME%\tests) and requires Python 3.5 or higher.

Tests can be run with the following command:

python ./tests/test_sortmerna.py
python ./tests/test_sortmerna_zlib.py

Make sure the data folder is in the same directory as test_sortmerna.py

Users require scikit-bio 0.5.0 to run the tests.

Third-party libraries

Various features in SortMeRNA are dependent on third-party libraries, including:

  • ALP: computes statistical parameters for Gumbel distribution (K and Lambda)
  • CMPH: C Minimal Perfect Hashing Library
  • KSEQ: FASTA/FASTQ parser (including compressed files)
  • PARASAIL: Pairwise Sequence Alignment Library

Wrappers and Packages

Galaxy

Thanks to Björn Grüning and Nicola Soranzo, an up-to-date Galaxy wrapper exists for SortMeRNA. Please visit Björn's github page for installation.

Debian

Thanks to the Debian Med team, SortMeRNA 2.0 is now a package in Debian. Thanks to Andreas Tille for the sortmerna and indexdb_rna man pages (version 2.0). These have been updated for 2.1 in the master repository.

GNU Guix

Thanks to Ben Woodcroft for adding SortMeRNA 2.1 to GNU Guix, find the package here.

QIIME

SortMeRNA 2.0 can be used in QIIME's pick_closed_reference_otus.py, pick_open_reference_otus.py and assign_taxonomy.py scripts.

Note: At the moment, only 2.0 is compatible with QIIME.

Taxonomies

The folder rRNA_databases/silva_ids_acc_tax.tar.gz contains SILVA taxonomy strings (extracted from XML file generated by ARB) for each of the reference sequences in the representative databases. The format of the files is three tab-separated columns, the first being the reference sequence ID, the second being the accession number and the final column is the taxonomy.

Citation

If you use SortMeRNA, please cite: Kopylova E., Noé L. and Touzet H., "SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data", Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611.

Contributors

See AUTHORS for a list of contributors to this project.

About

SortMeRNA: a sequence analysis tool for filtering, mapping and clustering NGS reads.

Resources

License

GPL-3.0, GPL-3.0 licenses found

Licenses found

GPL-3.0
LICENSE.txt
GPL-3.0
COPYING

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 64.5%
  • C 11.3%
  • Python 10.3%
  • TeX 6.7%
  • Shell 4.1%
  • Roff 1.2%
  • Other 1.9%