-
Notifications
You must be signed in to change notification settings - Fork 12
Manual v2.0.0 rc6
sebhtml edited this page May 1, 2012
·
2 revisions
NAME Ray - assemble genomes in parallel using the message-passing interface SYNOPSIS mpiexec -np NUMBER_OF_RANKS Ray -k KMERLENGTH -p l1_1.fastq l1_2.fastq -p l2_1.fastq l2_2.fastq -o test DESCRIPTION: The Ray genome assembler is built on top of the RayPlatform, a generic plugin-based distributed and parallel compute engine that uses the message-passing interface for passing messages. Ray targets several applications: - de novo genome assembly - de novo meta-genome assembly - de novo transcriptome assembly (works, but not tested a lot) - quantification of contig abundances - quantification of microbiome consortia members - quantification of transcript expression - taxonomy profiling of samples - gene ontology profiling of samples -help Displays this help page. -version Displays Ray version and compilation options. K-mer length -k kmerLength Selects the length of k-mers. The default value is 21. It must be odd because reverse-complement vertices are stored together. The maximum length is defined at compilation by MAXKMERLENGTH Larger k-mers utilise more memory. Inputs -p leftSequenceFile rightSequenceFile [averageOuterDistance standardDeviation] Provides two files containing paired-end reads. averageOuterDistance and standardDeviation are automatically computed if not provided. -i interleavedSequenceFile [averageOuterDistance standardDeviation] Provides one file containing interleaved paired-end reads. averageOuterDistance and standardDeviation are automatically computed if not provided. -s sequenceFile Provides a file containing single-end reads. Biological abundances -search searchDirectory Provides a directory containing fasta files to be searched in the de Bruijn graph. Biological abundances will be written to RayOutput/BiologicalAbundances See Documentation/BiologicalAbundances.txt -one-color-per-file Sets one color per file instead of one per sequence. By default, each sequence in each file has a different color. For files with large numbers of sequences, using one single color per file may be more efficient. Taxonomic profiling with colored de Bruijn graphs -with-taxonomy Genome-to-Taxon.tsv TreeOfLife-Edges.tsv Taxon-Names.tsv Provides a taxonomy. Computes and writes detailed taxonomic profiles. See Documentation/Taxonomy.txt for details. -gene-ontology OntologyTerms.txt Annotations.txt Provides an ontology and annotations. OntologyTerms.txt is fetched from http://geneontology.org Annotations.txt is a 2-column file (EMBL_CDS handle & gene ontology identifier) See Documentation/GeneOntology.txt Outputs -o outputDirectory Specifies the directory for outputted files. Default is RayOutput Other outputs -amos Writes the AMOS file called RayOutput/AMOS.afg An AMOS file contains read positions on contigs. Can be opened with software with graphical user interface. -write-kmers Writes k-mer graph to RayOutput/kmers.txt The resulting file is not utilised by Ray. The resulting file is very large. -write-read-markers Writes read markers to disk. -write-seeds Writes seed DNA sequences to RayOutput/Rank.RaySeeds.fasta -write-extensions Writes extension DNA sequences to RayOutput/Rank.RayExtensions.fasta -write-contig-paths Writes contig paths with coverage values to RayOutput/Rank.RayContigPaths.txt -write-marker-summary Writes marker statistics. Memory usage -show-memory-usage Shows memory usage. Data is fetched from /proc on GNU/Linux Needs __linux__ -show-memory-allocations Shows memory allocation events Algorithm verbosity -show-extension-choice Shows the choice made (with other choices) during the extension. -show-ending-context Shows the ending context of each extension. Shows the children of the vertex where extension was too difficult. -show-distance-summary Shows summary of outer distances used for an extension path. -show-consensus Shows the consensus when a choice is done. Assembly options (defaults work well) -minimum-contig-length Changes the minimum contig length, default is 100 -color-space Runs in color-space Needs csfasta files. Activated automatically if csfasta files are provided. -minimumCoverage minimumCoverage Sets manually the minimum coverage. If not provided, it is computed by Ray automatically. -peakCoverage peakCoverage Sets manually the peak coverage. If not provided, it is computed by Ray automatically. -repeatCoverage repeatCoverage Sets manually the repeat coverage. If not provided, it is computed by Ray automatically. Checkpointing -write-checkpoints Write checkpoint files -read-checkpoints Read checkpoint files -read-write-checkpoints Read and write checkpoint files Message routing for large number of cores -route-messages Enables the Ray message router. Disabled by default. Messages will be routed accordingly so that any rank can communicate directly with only a few others. Without -route-messages, any rank can communicate directly with any other rank. Files generated: Routing/Connections.txt, Routing/Routes.txt and Routing/RelayEvents.txt and Routing/Summary.txt -connection-type type Sets the connection type for routes. Accepted values are debruijn, group, random, kautz and complete. Default is debruijn. With the type debruijn, the number of ranks must be a power of something. Examples: 256 = 16*16, 512=8*8*8, 49=7*7, and so on. Otherwise, don't use debruijn routing but use another one With the type kautz, the number of ranks n must be n=(k+1)*k^(d-1) for some k and d -routing-graph-degree degree Specifies the outgoing degree for the routing graph. Hardware testing -test-network-only Tests the network and returns. -write-network-test-raw-data Writes one additional file per rank detailing the network test. Debugging -run-profiler Runs the profiler as the code runs. By default, only show granularity warnings. Running the profiler increases running times. -with-profiler-details Shows number of messages sent and received in each methods during in each time slices (epochs). Needs -run-profiler. -show-communication-events Shows all messages sent and received. -show-read-placement Shows read placement in the graph during the extension. -debug-bubbles Debugs bubble code. Bubbles can be due to heterozygous sites or sequencing errors or other (unknown) events -debug-seeds Debugs seed code. Seeds are paths in the graph that are likely unique. -debug-fusions Debugs fusion code. -debug-scaffolder Debug the scaffolder. FILES Input files Note: file format is determined with file extension. .fasta .fasta.gz (needs HAVE_LIBZ=y at compilation) .fasta.bz2 (needs HAVE_LIBBZ2=y at compilation) .fastq .fastq.gz (needs HAVE_LIBZ=y at compilation) .fastq.bz2 (needs HAVE_LIBBZ2=y at compilation) .sff (paired reads must be extracted manually) .csfasta (color-space reads) Outputted files Scaffolds RayOutput/Scaffolds.fasta The scaffold sequences in FASTA format RayOutput/ScaffoldComponents.txt The components of each scaffold RayOutput/ScaffoldLengths.txt The length of each scaffold RayOutput/ScaffoldLinks.txt Scaffold links Contigs RayOutput/Contigs.fasta Contiguous sequences in FASTA format RayOutput/ContigLengths.txt The lengths of contiguous sequences Summary RayOutput/OutputNumbers.txt Overall numbers for the assembly de Bruijn graph RayOutput/CoverageDistribution.txt The distribution of coverage values RayOutput/CoverageDistributionAnalysis.txt Analysis of the coverage distribution RayOutput/degreeDistribution.txt Distribution of ingoing and outgoing degrees RayOutput/kmers.txt k-mer graph, required option: -write-kmers The resulting file is not utilised by Ray. The resulting file is very large. Assembly steps RayOutput/SeedLengthDistribution.txt Distribution of seed length RayOutput/Rank.OptimalReadMarkers.txt Read markers. RayOutput/Rank.RaySeeds.fasta Seed DNA sequences, required option: -write-seeds RayOutput/Rank.RayExtensions.fasta Extension DNA sequences, required option: -write-extensions RayOutput/Rank.RayContigPaths.txt Contig paths with coverage values, required option: -write-contig-paths Paired reads RayOutput/LibraryStatistics.txt Estimation of outer distances for paired reads RayOutput/Library.txt Frequencies for observed outer distances (insert size + read lengths) Partition RayOutput/NumberOfSequences.txt Number of reads in each file RayOutput/SequencePartition.txt Sequence partition Ray software RayOutput/RayVersion.txt The version of Ray RayOutput/RayCommand.txt The exact same command provided AMOS RayOutput/AMOS.afg Assembly representation in AMOS format, required option: -amos Communication RayOutput/MessagePassingInterface.txt Number of messages sent RayOutput/NetworkTest.txt Latencies in microseconds RayOutput/RankNetworkTestData.txt Network test raw data DOCUMENTATION This help page (always up-to-date) Manual (Portable Document Format): InstructionManual.pdf Mailing list archives: http://sourceforge.net/mailarchive/forum.php?forum_name=denovoassembler-users AUTHOR Written by Sébastien Boisvert. REPORTING BUGS Report bugs to [email protected] Home page: COPYRIGHT This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You have received a copy of the GNU General Public License along with this program (see LICENSE). Ray 2.0.0-rc6