Skip to content

Verkko v1.3

Compare
Choose a tag to compare
@brianwalenz brianwalenz released this 05 Mar 15:19
· 454 commits to master since this release

These are release notes for Verkko version 1.3, which was released on March 5rd, 2023. Verkko is a hybrid genome assembly pipeline developed for telomere-to-telomere assembly of accurate long reads (PacBio HiFi or Oxford Nanopore Duplex) and Oxford Nanopore ultra-long reads.

The source code distribution contains everything you need to create a binary distribution for your own specific OS. Please report any issues you encounter.

Citation

  • Rautiainen M, Nurk S, Walenz BP, Logsdon GA, Porubsky D, Rhie A, Eichler EE, Phillippy AM, Koren S. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat Biotech. (2023). doi:10.1038/s41587-023-01662-6

Minimum Requirements

  • 8GB minimum memory; 16GB strongly suggested
  • GCC 7 or newer (for compilation only)
  • Rust 1.58 or newer (for compilation only)
  • Python 3.5 or newer, with parasail module
  • Snakemake 7.0 or newer
  • Mashmap 2.0 or newer (for filtering known sequences)
  • GraphAligner v1.0.17 or newer
  • MBG v1.0.14 or newer

Installation

Users can download Verkko as source code or installed through a package manager like conda. The source code package needs to be compiled and installed before it can be used. Do NOT download the .zip source code. It is missing files and will not compile. This is a known flaw with git itself.

Run either:

conda install -c conda-forge -c bioconda -c defaults verkko

or build from source

curl -L https://github.com/marbl/verkko/releases/download/v1.3/verkko-v1.3.tar.gz --output verkko-v1.3.tar.gz
tar -xzf verkko-v1.3.tar.gz
cd verkko-v1.3/src
make -j 8
cd ..

Confirm the MD5 for the tar.gz matches expected:

0657abc847e3d554289d2a88a3fdf774  verkko-v1.3.tar.gz

Verkko will be installed in verkko-v1.3/bin. You can move the contents to verkko-v1.3/bin/* and verko-v1.3/lib/* to a central location if you would like. If GraphAligner or MBG are not available in your path, you may also symlink them under verkko/lib/verkko/bin/

See the README for more details.

Changes

  • Filtering of human rDNA, mitochondrial genome and Epstein–Barr virus via option --screen human (requires mashmap). Other contaminants can be filtered using --screen <label> </path/to/single/sequence.fasta>. All contigs matching a contaminant are removed from assembly.fasta and placed in their own file. The exemplar contig is circularized and saved separately.
  • Reduce memory usage in MBG.
  • Resolve haplotype gaps when HiFi reads are available but to fill the gap but aren't high enough accuracy or long enough to be included in the MBG graph.
  • Improved GraphAligner mapping within simple-sequence repeats.
  • Fix rukki resolution homogenizing paths when no/low marker information is available.
  • Fix rukki to assign repeat tangles to their appropriate haplotypes, when possible.

Bug Fixes

  • Extremely slow read correction overlap computation on some datasets was resolved by counting all k-mers instead of a subset.
  • Fix various MBG bugs (https://github.com/maickrau/MBG).
  • Stop when duplicate reads are input to avoid MBG crashes later (#126).
  • Fix incorrect phasing in case of tied ONT alignments.

Known Issues

See the issues page for up-to date open issues, or to report a problem.

  • An error in Snakefiles/7-combineConsensus.sm will cause ALL haplotype-aware assemblies (--hap-kmers) to fail with error:
    ./combineConsensus.sh: /software/lib/verkko/scripts/process_reads.py: not found
    Version v1.3.1 fixes the error, or you can fix a v1.3 installation with:
    cd /software/lib/verkko/scripts/ ; ln -s fasta_extract.py process_reads.py

  • Long runtime of MBG with very high HiFi coverage (>200x). We recommend downsampling to 100x.

  • Lost heterozygosity in simple-sequence repeats in low-heterozygosity samples. When there is no other variation within at most 1 HiFi read length away, the simple sequence repeat difference will be ignored and a consensus of both haplotypes is produced. This will be addressed in a future release.

Legal

See the README.licenses file and individual source code files for details.