[Overview] (#overview)
[Copy right] (#copyright)
[How to cite LEAF?] (#cite)
[Short Manual] (#manual)
LEAF is suitable for 32-bit or 64-bit machines with Linux operating systems. At least 4GB of system memory is recommended for assembling larger data sets.
-
Installation
Aligners Bowtie2 and BLAT are required to run LEAF.
The downloaded LEAF.cpp file can be compiled with commandg++ -o LEAF LEAF.cpp -lpthread
. -
Inputs
- Paired-end DNA reads in FASTA format.
- De novo contigs assembled by any de novo DNA-Seq assembler (Velvet, ABySS, etc.).
- Reference genome from a closely related species.
-
Using LEAF
LEAF --read1 reads_1.fa --read2 reads_2.fa --contig contigs.fa --genome genome.fa --distanceLow distanceLow --distanceHigh distancehigh --extendedContig extendedContigs.fa --remainingContig remainingContigs.fa [--kMer k --insertVariation insertVariation --coverage coverage --noAlignment --part p]
Inputs: --read1 is the the first pair of PE DNA reads in fasta format --read2 is the the second pair of PE DNA reads in fasta format --contig is the initial contigs in fasta format --genome is the reference genome in fasta format --distanceLow is the lower bound of alignment distance between the first and second pairs of PE DNA reads --distanceHigh is the upper bound of alignment distance between the first and second pairs of PE DNA reads Outputs: --extendedContig is the extended contig file in fasta format --remainingContig is the not extended initial contig file in fasta format Options: --kMer is the k-mer size (default: 5) --insertVariation is the standard variation of insert length (default: 100) --coverage is the minimum coverage to keep a path in de Bruijn graph (default: 20) --noAlignment skips the initial time-consuming alignment step, if all the alignment files have been provided in tmp directory (default: none) --part is the number of parts a chromosome is divided into when it is loaded to reduce memory requirement (default: 1)
-
Outputs
- Extended contigs in FASTA format.
- Remaining contigs not extended in FASTA format.