Skip to content

Tutorial

Alexey Kozlov edited this page Nov 12, 2018 · 27 revisions

Intro

RAxML-NG replaces standard RAxML as well as the corresponding supercomputer version ExaML. So RAxML-NG is one single code base that scales from the laptop to the supercomputer. RAxML-NG does not (yet) support all options of standard RAxML, only the most important and frequently used ones. Some options are now implemented as stand-alone tools, e.g. phylogenetic placement (EPA).

This tutorial is based on RAxML-NG practical taught by Alexandros Stamatakis at COME 2018. It will cover most common use cases, for information on advanced usage see next section.

Before you start:

  • Download and install RAxML-NG (instructions)
  • Download a toy dataset:

Getting help

If you run RAxML-NG executable without parameters, it will show quick usage help:

RAxML-NG v. 0.7.0git BETA released on 31.10.2018 by The Exelixis Lab.
Authors: Alexey Kozlov, Alexandros Stamatakis, Diego Darriba, Tomas Flouri, Benoit Morel.
Latest version: https://github.com/amkozlov/raxml-ng
Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml

WARNING: This is a BETA release, please use at your own risk!

Usage: raxml-ng [OPTIONS]

Commands (mutually exclusive):
  --help                                     display help information
  --version                                  display version information
  --evaluate                                 evaluate the likelihood of a tree (with model+brlen optimization)
  --search                                   ML tree search.
  --bootstrap                                bootstrapping
  --all                                      all-in-one (ML search + bootstrapping).
  --support                                  compute bipartition support for a given reference tree (e.g., best ML tree)
                                             and a set of replicate trees (e.g., from a bootstrap analysis)
  --bsconverge                               test for bootstrapping convergence using autoMRE criterion
  --terrace                                  check whether a tree lies on a phylogenetic terrace 
  --check                                    check alignment correctness and remove empty columns/rows
  --parse                                    parse alignment, compress patterns and create binary MSA file
  --start                                    generate parsimony/random starting trees and exit
  --loglh                                    compute the likelihood of a fixed tree (no model/brlen optimization)

Input and output options:
[...]

More comprehensive documentation is available in GitHub wiki. Further information and benchmarks can be found in biorxiv preprint and in Chapter 4 of Alexey's PhD thesis.

If you cannot find an answer to your question in the above sources, or if you think you found a bug, please contact us via RAxML google group.

  • Please use search function before posting, since many questions have been answered before.
  • Please use google group and not personal e-mail for asking questions about RAxML-NG. This will save everybody's time: you might get help sooner from other Exelixis lab members or your fellow users. And whoever might encounter the same problem in the future will benefit from the answer.

Preparing the alignment

Before we get started, let's first check that the MSA can actually be read and doesn't contain sites with only undetermined characters or sequences with undetermined characters or duplicate taxon names, etc. etc.

raxml-ng --check --msa test.fa --model GTR+G --prefix T1

Doing this check before getting started is super-important as more than 50% of all failed RAxML runs are due to tree or MSA format errors!

We will always also use --prefix to avoid over-writing previous output files.

For large alignments, we also recommend using --parse command after (or instead of) --check:

raxml-ng --parse --msa test.fa --model GTR+G --prefix T2

In addition to MSA sanity check, this command will perform two useful operations:

  1. Compress alignment patterns and store MSA in the binary format (RAxML Binary Alignment, RBA):
NOTE: Binary MSA file created: T2.raxml.rba

Since pattern compression could take quite some time for large MSAs, loading RBA file is (much) faster compared to FASTA or PHYLIP.

  1. Estimate memory requirements and optimal number of CPUs/threads (see Parallelization section for details)
* Estimated memory requirements                : 54 MB
* Recommended number of threads / MPI processes: 4

Tree inference

Now let's infer a tree under GTR+GAMMA with default parameters:

  raxml-ng --msa test.fa --model GTR+G --prefix T3

Bootstrapping

Tree likelihood evaluation

Another standard task is to evaluate trees, i.e., compute the likelihood of a given fixed tree topology by just optimizing model and branch length parameters on that fixed tree. This is frequently needed in model and hypothesis testing :-)

The basic option is --evaluate

With --opt-model on/off you can enable/disable model parameter optimization.

With --opt-branches on/off you can enable/disable branch length optimization.

Let's do some small tests that also show how the likelihood improves as we add more and more free parameters to our model. We will use the best-scoring ML tree again:

Let's first evaluate it under the most simple model, Jukes-Cantor (JC):

  raxml-ng --evaluate --msa test.fa --threads 1 --model JC --tree T9.raxml.bestTree --prefix E1

Now, let's add rate heterogeneity to this:

  raxml-ng --evaluate --msa test.fa --threads 1 --model JC+G -tree T9.raxml.bestTree --prefix E2

Now let's take a simple GTR model (without rate heterogeneity):

  raxml-ng --evaluate --msa test.fa --threads 1 --model GTR --tree T9.raxml.bestTree --prefix E3

GTR with the Gamma model of rate heterogeneity, but empirical base frequencies:

  raxml-ng --evaluate --msa test.fa --threads 1 --model GTR+G+FC --tree T9.raxml.bestTree --prefix E4

And now also doing a ML estimate of the base frequencies. How many more free parameters do we get?.

  raxml-ng --evaluate --msa test.fa --threads 1 --model GTR+G+FO --tree T9.raxml.bestTree --prefix E5

Let's check the results:

  grep logLikelihood E*.raxml.log
  E1.raxml.log:[00:00:00] Tree #1, final logLikelihood: -4444.084375 <- JC
  E2.raxml.log:[00:00:00] Tree #1, final logLikelihood: -4270.170317 <- JC+GAMMA 
  E3.raxml.log:[00:00:00] Tree #1, final logLikelihood: -4280.099457 <- GTR
  E4.raxml.log:[00:00:00] Tree #1, final logLikelihood: -4075.205410 <- GTR + GAMMA + empirical base freqs
  E5.raxml.log:[00:00:00] Tree #1, final logLikelihood: -4069.207897 <- GTR + GAMMA + estimated base freqs

Getting the best performance out of RAxML-NG

Advanced commands