Skip to content
lomereiter edited this page Jun 23, 2012 · 8 revisions

Welcome to the BAMread/sambamba wiki!

Sambamba is a set of tools and libraries for seriously fast parsing of SAM/BAM file formats for next generation sequencing (NGS) datasets. Currently we have full SAM/BAM parsing support, at speeds comparable to samtools. Sambamba is a Google Summer of Code project (GSOC) by Artem Tarasov (lomereiter) in the D programming language. The code base is written in functional style, and is designed to be fast, easily maintainable, correct, pragmatic, and something to build other functionality on. Sambamba is also designed with parallel computing in mind, so expect speeds to go up soon!

A number of functions are production ready, e.g.

  • reading BAM
  • random access with existing index file
  • SAM/BAM output
  • ...

(reading SAM will be implemented soon, see 'samragel' branch)

Sambamba compiles into shared libraries and command line binaries. Here you can find a few tutorials on how to work with the library and tools.

The first thing to do it to read getting started page. It will tell you basic things like how to install the library, and how to compile sample code. The next step is to learn how to access alignment records and work with them.

Once you've got acquainted with reading BAM and modifying records, you might want to use this library in your pipeline. For that, you have to know how to print records, and how to save them to a file. Both SAM and BAM output are quite easy, see corresponding pages.

Also, some command-line tools are supplied with the library. You can find them in CLItools/ directory. If you're a developer, they can serve as real-world examples of how to use the library. If you're a bioinformatician, you can assess their speed and see if it's worth it to use them in your pipelines.