Skip to content
sebhtml edited this page May 1, 2012 · 2 revisions
NAME
       Ray - assemble genomes in parallel using the message-passing interface

SYNOPSIS
       mpiexec -np NUMBER_OF_RANKS Ray -k KMERLENGTH -p l1_1.fastq l1_2.fastq -p l2_1.fastq l2_2.fastq -o test

DESCRIPTION:

  The Ray genome assembler is built on top of the RayPlatform, a generic plugin-based
  distributed and parallel compute engine that uses the message-passing interface
  for passing messages.

  Ray targets several applications:

    - de novo genome assembly
    - de novo meta-genome assembly
    - de novo transcriptome assembly (works, but not tested a lot)
    - quantification of contig abundances
    - quantification of microbiome consortia members
    - quantification of transcript expression
    - taxonomy profiling of samples
    - gene ontology profiling of samples

       -help
              Displays this help page.

       -version
              Displays Ray version and compilation options.

  K-mer length

       -k kmerLength
              Selects the length of k-mers. The default value is 21.
              It must be odd because reverse-complement vertices are stored together.
              The maximum length is defined at compilation by MAXKMERLENGTH
              Larger k-mers utilise more memory.

  Inputs

       -p leftSequenceFile rightSequenceFile [averageOuterDistance standardDeviation]
              Provides two files containing paired-end reads.
              averageOuterDistance and standardDeviation are automatically computed if not provided.

       -i interleavedSequenceFile [averageOuterDistance standardDeviation]
              Provides one file containing interleaved paired-end reads.
              averageOuterDistance and standardDeviation are automatically computed if not provided.

       -s sequenceFile
              Provides a file containing single-end reads.

  Biological abundances

       -search searchDirectory
              Provides a directory containing fasta files to be searched in the de Bruijn graph.
              Biological abundances will be written to RayOutput/BiologicalAbundances
              See Documentation/BiologicalAbundances.txt

       -one-color-per-file
              Sets one color per file instead of one per sequence.
              By default, each sequence in each file has a different color.
              For files with large numbers of sequences, using one single color per file may be more efficient.

  Taxonomic profiling with colored de Bruijn graphs

       -with-taxonomy Genome-to-Taxon.tsv TreeOfLife-Edges.tsv Taxon-Names.tsv
              Provides a taxonomy.
              Computes and writes detailed taxonomic profiles.
              See Documentation/Taxonomy.txt for details.

       -gene-ontology OntologyTerms.txt Annotations.txt
              Provides an ontology and annotations.
              OntologyTerms.txt is fetched from http://geneontology.org
              Annotations.txt is a 2-column file (EMBL_CDS handle & gene ontology identifier)
              See Documentation/GeneOntology.txt
  Outputs

       -o outputDirectory
              Specifies the directory for outputted files. Default is RayOutput

  Other outputs

       -amos
              Writes the AMOS file called RayOutput/AMOS.afg
              An AMOS file contains read positions on contigs.
              Can be opened with software with graphical user interface.

       -write-kmers
              Writes k-mer graph to RayOutput/kmers.txt
              The resulting file is not utilised by Ray.
              The resulting file is very large.

       -write-read-markers
              Writes read markers to disk.

       -write-seeds
              Writes seed DNA sequences to RayOutput/Rank.RaySeeds.fasta

       -write-extensions
              Writes extension DNA sequences to RayOutput/Rank.RayExtensions.fasta

       -write-contig-paths
              Writes contig paths with coverage values
              to RayOutput/Rank.RayContigPaths.txt

       -write-marker-summary
              Writes marker statistics.

  Memory usage

       -show-memory-usage
              Shows memory usage. Data is fetched from /proc on GNU/Linux
              Needs __linux__

       -show-memory-allocations
              Shows memory allocation events

  Algorithm verbosity

       -show-extension-choice
              Shows the choice made (with other choices) during the extension.

       -show-ending-context
              Shows the ending context of each extension.
              Shows the children of the vertex where extension was too difficult.

       -show-distance-summary
              Shows summary of outer distances used for an extension path.

       -show-consensus
              Shows the consensus when a choice is done.

  Assembly options (defaults work well)

       -minimum-contig-length
              Changes the minimum contig length, default is 100

       -color-space
              Runs in color-space
              Needs csfasta files. Activated automatically if csfasta files are provided.

       -minimumCoverage minimumCoverage
              Sets manually the minimum coverage.
              If not provided, it is computed by Ray automatically.

       -peakCoverage peakCoverage
              Sets manually the peak coverage.
              If not provided, it is computed by Ray automatically.

       -repeatCoverage repeatCoverage
              Sets manually the repeat coverage.
              If not provided, it is computed by Ray automatically.

  Checkpointing

       -write-checkpoints
              Write checkpoint files

       -read-checkpoints
              Read checkpoint files

       -read-write-checkpoints
              Read and write checkpoint files

  Message routing for large number of cores

       -route-messages
              Enables the Ray message router. Disabled by default.
              Messages will be routed accordingly so that any rank can communicate directly with only a few others.
              Without -route-messages, any rank can communicate directly with any other rank.
              Files generated: Routing/Connections.txt, Routing/Routes.txt and Routing/RelayEvents.txt
              and Routing/Summary.txt

       -connection-type type
              Sets the connection type for routes.
              Accepted values are debruijn, group, random, kautz and complete. Default is debruijn.
              With the type debruijn, the number of ranks must be a power of something.
              Examples: 256 = 16*16, 512=8*8*8, 49=7*7, and so on.
              Otherwise, don't use debruijn routing but use another one
              With the type kautz, the number of ranks n must be n=(k+1)*k^(d-1) for some k and d

       -routing-graph-degree degree
              Specifies the outgoing degree for the routing graph.

  Hardware testing

       -test-network-only
              Tests the network and returns.

       -write-network-test-raw-data
              Writes one additional file per rank detailing the network test.

  Debugging

       -run-profiler
              Runs the profiler as the code runs. By default, only show granularity warnings.
              Running the profiler increases running times.

       -with-profiler-details
              Shows number of messages sent and received in each methods during in each time slices (epochs). Needs -run-profiler.

       -show-communication-events
              Shows all messages sent and received.

       -show-read-placement
              Shows read placement in the graph during the extension.

       -debug-bubbles
              Debugs bubble code.
              Bubbles can be due to heterozygous sites or sequencing errors or other (unknown) events

       -debug-seeds
              Debugs seed code.
              Seeds are paths in the graph that are likely unique.

       -debug-fusions
              Debugs fusion code.

       -debug-scaffolder
              Debug the scaffolder.


FILES

  Input files

     Note: file format is determined with file extension.

     .fasta
     .fasta.gz (needs HAVE_LIBZ=y at compilation)
     .fasta.bz2 (needs HAVE_LIBBZ2=y at compilation)
     .fastq
     .fastq.gz (needs HAVE_LIBZ=y at compilation)
     .fastq.bz2 (needs HAVE_LIBBZ2=y at compilation)
     .sff (paired reads must be extracted manually)
     .csfasta (color-space reads)

  Outputted files

  Scaffolds

     RayOutput/Scaffolds.fasta
      The scaffold sequences in FASTA format
     RayOutput/ScaffoldComponents.txt
      The components of each scaffold
     RayOutput/ScaffoldLengths.txt
      The length of each scaffold
     RayOutput/ScaffoldLinks.txt
      Scaffold links

  Contigs

     RayOutput/Contigs.fasta
      Contiguous sequences in FASTA format
     RayOutput/ContigLengths.txt
      The lengths of contiguous sequences

  Summary

     RayOutput/OutputNumbers.txt
      Overall numbers for the assembly

  de Bruijn graph

     RayOutput/CoverageDistribution.txt
      The distribution of coverage values
     RayOutput/CoverageDistributionAnalysis.txt
      Analysis of the coverage distribution
     RayOutput/degreeDistribution.txt
      Distribution of ingoing and outgoing degrees
     RayOutput/kmers.txt
      k-mer graph, required option: -write-kmers
         The resulting file is not utilised by Ray.
         The resulting file is very large.

  Assembly steps

     RayOutput/SeedLengthDistribution.txt
         Distribution of seed length
     RayOutput/Rank.OptimalReadMarkers.txt
         Read markers.
     RayOutput/Rank.RaySeeds.fasta
         Seed DNA sequences, required option: -write-seeds
     RayOutput/Rank.RayExtensions.fasta
         Extension DNA sequences, required option: -write-extensions
     RayOutput/Rank.RayContigPaths.txt
         Contig paths with coverage values, required option: -write-contig-paths

  Paired reads

     RayOutput/LibraryStatistics.txt
      Estimation of outer distances for paired reads
     RayOutput/Library.txt
         Frequencies for observed outer distances (insert size + read lengths)

  Partition

     RayOutput/NumberOfSequences.txt
         Number of reads in each file
     RayOutput/SequencePartition.txt
      Sequence partition

  Ray software

     RayOutput/RayVersion.txt
      The version of Ray
     RayOutput/RayCommand.txt
      The exact same command provided

  AMOS

     RayOutput/AMOS.afg
      Assembly representation in AMOS format, required option: -amos

  Communication

     RayOutput/MessagePassingInterface.txt
Number of messages sent
     RayOutput/NetworkTest.txt
Latencies in microseconds
     RayOutput/RankNetworkTestData.txt
Network test raw data

DOCUMENTATION

       This help page (always up-to-date)
       Manual (Portable Document Format): InstructionManual.pdf
       Mailing list archives: http://sourceforge.net/mailarchive/forum.php?forum_name=denovoassembler-users

AUTHOR
       Written by Sébastien Boisvert.

REPORTING BUGS
       Report bugs to [email protected]
       Home page: 

COPYRIGHT
       This program is free software: you can redistribute it and/or modify
       it under the terms of the GNU General Public License as published by
       the Free Software Foundation, version 3 of the License.

       This program is distributed in the hope that it will be useful,
       but WITHOUT ANY WARRANTY; without even the implied warranty of
       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
       GNU General Public License for more details.

       You have received a copy of the GNU General Public License
       along with this program (see LICENSE).

Ray 2.0.0-rc6
Clone this wiki locally