Next-generation massively parallel short-read mapping on FPGAs

Background

The mapping of DNA sequences to huge genome databases is an essential analysis task in modern molecular biology. Having linearized reference genomes available, the alignment of short DNA reads obtained from the sequencing of an individual genome against such a database provides a powerful diagnostic and analysis tool. In essence, this task amounts to a simple string search tolerating a certain number of mismatches to account for the diversity of individuals. The complexity of this process arises from the sheer size of the reference genome. It is further amplified by current next- generation sequencing technologies, which produce a huge number of increasingly short reads. These short reads hurt established alignment heuristics like BLAST severely.

This project includes an FPGA-based custom computation on a Xilinx Virtex-6 FPGA, which performs the alignment of short DNA reads in a timely manner by the use of tremendous concurrency for reasonable costs.

On the contrary, advanced fast alignment heuristics like Bowtie and Maq can only tolerate small mismatch maximums with a quick deterioration of the probability to detect existing valid alignments. The performance comparison with these widely used software tools also demonstrates that the proposed FPGA computation achieves its guaranteed exact results in very competitive time.

Architectur

The overall architecture is based on a host system and an over Gigabit Ethernet directly connected FPGA board (ML605 develpment board) for the alignment of the sequences. The FPGA design guarantees to find all alignment locations of a read in the database while also allowing a freely adjustable character mismatch threshold.

Usage (Host-Program)

Command	Short	Description
--query	-q	Reads in FASTA or FASTQ
--database	-d	genome database in FASTA
--bindb	-b	binary database
--output	-o	output filename
--sam	-s	write the output in SAM format
--unmap	-u	additional output of unmapped reads
--transform	-t	transformation of the ASCII-Database into the required a binary format
--mismatch [int]	-m	number of allowed mismatches
--status	-i	display FPGA status information
--help	-h	this help text

Documentation and References

The Diploma Thesis (German) gives the overall background, implementation insights and performance results. The results are also published in an international conference:

Next-generation massively parallel short-read mapping on FPGAs; O Knodel, TB Preußer, RG Spallek - ASAP 2011-22nd IEEE International Conference on Application-specific Systems, Architectures and Processors, Santa Monica, USA, September 2011.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
documentation		documentation
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Next-generation massively parallel short-read mapping on FPGAs

Background

Architectur

Usage (Host-Program)

Documentation and References

About

Releases

Packages

Languages

License

knodel/FPGA-Genome-Alignment

Folders and files

Latest commit

History

Repository files navigation

Next-generation massively parallel short-read mapping on FPGAs

Background

Architectur

Usage (Host-Program)

Documentation and References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages