Skip to content

Latest commit

 

History

History
211 lines (142 loc) · 9.52 KB

README.rst

File metadata and controls

211 lines (142 loc) · 9.52 KB

LaRA 2: Lagrangian Relaxed structural Alignment

LaRA 2 is an improved version of LaRA, a tool for sequence-structure alignment of RNA sequences. It...

  • computes all pairwise sequence-structure alignments of the input sequences
  • produces files that can be processed with T-Coffee or MAFFT to compute a multiple sequence-structure alignment
  • employs methods from combinatorial optimization to compute feasible solutions for an integer linear program
  • can read many input formats for RNA structure, e.g. Dot-bracket notation, Stockholm, Vienna format
  • is implemented to use multiple threads on your machine and runs therefore very fast
  • has a vectorized alignment kernel, which computes the results even faster
  • is based on the SeqAn library, currently version 2
  • is well-documented and easy to use

Download instructions

Clone the repository and use the --recurse-submodules option for downloading SeqAn and Lemon as submodules.

% git clone --recurse-submodules https://github.com/seqan/lara.git

Alternatively, you can download a zip package of the repository via the green button at the top of the github page. If you do so, please unzip the file into a new subdirectory named lara and download the dependencies separately.

Requirements

  • platforms: Linux, MacOS
  • compiler: gcc ≥ 5 or clang ≥ 3.8 or icc ≥ 17
  • cmake ≥ 3.8

LaRA is dependent on the following libraries:

To process the output for multiple alignments (3 or more sequences), you need either

Optionally, LaRA can predict the RNA structures for you if you provide

Note: Users reported problems with installing ViennaRNA, so we provide some hints here.

  1. Install the GNU MPFR Library first.
  2. Exclude unnecessary components of ViennaRNA: ./configure --without-swig --without-kinfold --without-forester --without-rnalocmin --without-gsl
  3. If you have linker issues use ./configure --disable-lto
  4. If your system supports SSE4.1 instructions then we recommend ./configure --enable-sse

If you have further suggestions, we are happy to add them here.

Build instructions

Please create a new directory and build the program for your platform.

% mkdir bin
% cd bin
% cmake ../lara
% make
% cd ..

Usage

After building the program binary, running LaRA is as simple as

% bin/lara -i sequences.fasta

With the -i parameter you can pass one of the following formats to LaRA. The filename must end with one of the specified suffixes, because the suffix determines the correct format parser.

  • FASTA sequence format (.fa, .fasta, .faa, .ffn, .fna, .frn)
  • FASTQ sequence with quality annotation (.fq, .fastq)
  • Raw sequence format (.raw)
  • EMBL sequence format (.embl)
  • Genbank sequence format (.gbk)
  • Dot-bracket notation, with support for various bracket types (.dbn)
  • Vienna format, dot-bracket without pseudoknot (.dbv)
  • Stockholm format (.sth)
  • Connectivity Table (.ct)
  • Bpseq format (.bpseq)
  • Extended Bpseq, with support for base pair probabilities (.ebpseq)

Note that for some formats you need the ViennaRNA dependency, as the program must predict base pair probabilities. Instead, you can pass at least two dot plot files, which contain the base pair probabilities for a single sequence each. Important: RNAfold must be executed with -p in order to retrieve a _dp.ps dot plot file!

% bin/lara -d seq1_dp.ps -d seq2_dp.ps

The pairwise structural alignments are printed to stdout in the T-Coffee Library format (see below). If you want to store the result in a file, please use the -w option or redirect the output.

% bin/lara -i sequences.fasta -w results.lib
% bin/lara -i sequences.fasta  > results.lib

We recommend you to specify the number of threads with the -j option, e.g. to execute 4 alignments in parallel. If you specify -j 0 the program tries to detect the maximal number of threads available on your machine.

% bin/lara -i sequences.fasta -j 4

For a list of options, please see the help message:

% bin/lara --help

Output format

Each output format is sorted primarily by the first and subsequently by the second sequence index.

for multiple alignments with T-Coffee

The result of LaRA is a T-Coffee library file and its format is documented here. It contains the structural scores for each residue pair of each computed sequence pair. This file is the input for T-Coffee, which computes the multiple alignment based on the scores:

% bin/t_coffee -lib results.lib

for multiple alignments with MAFFT

LaRA has an additional output format that can be read by the MAFFT framework. Each pairwise alignment produces three lines: a description line composed of the two sequence ids and the two gapped sequences of the alignment.

> first id && second id
AACCG-UU
-ACCGGUU
> first id && third id
AA-CCGUU
AAGCCGUU

MAFFT invokes LaRA with the option -o pairs for receiving this output format.

for pairwise alignments

LaRA can produce the aligned FastA format, which is recommended for a single pairwise alignment. It looks like a normal FastA file with gap symbols in the sequences:

> first id
AACCG-UU
> second id
-ACCGGUU

You need to pass the option -o fasta to the LaRA call for getting this output format.

LaRA prints a warning if you use this format with more than two sequences. Using this format with 3 or more sequences is possible but not recommended, because additional pairwise alignments will simply be appended to the file, and it may be hard to distinguish the pairs later. In addition, this can confuse other programs, which expect a single multiple sequence alignment as produced by MAFFT or T-Coffee.

Authorship & Copyright

LaRA 2 is being developed by Jörg Winkler and Gianvito Urgese, but it incorporates a lot of work from other members of the SeqAn project.

Feedback & Updates

GitHub You can ask questions and report bugs on the github tracker. Please also subscribe and/or star us!
Newsletter You can also follow SeqAn on twitter to receive updates on LaRA.

Icons on this page by Austin Andrews: https://github.com/Templarian/WindowsIcons