Bioinformatics utilities by Douglas Senalik

bb stands for black box

The program bb is a program to list the other programs in the project with a short one or two line summary of the program.

Type bb for a list of all programs, or bb followed by some text for a grep-like limit to the list of programs returned. Due to poor programming on my part, bb assumes that programs are installed in /usr/local/bin or /usr/local/bb


This is a Perl program that will take an assembly of Roche 454 sequences generated by the Roche newbler/gsAssembler, and displays all information for one or more specified contigs, in particular, the connection and read flowthrough information.

This is a Perl program that will take an assembly of Roche 454 sequences generated by the Roche newbler/gsAssembler, and use the connection information to link generated contigs into a graphical map.

Massimo Iorizzo, Douglas Senalik, Marek Szklarczyk, Dariusz Grzebelus, David Spooner and Philipp Simon De novo assembly of the carrot mitochondrial genome using next generation sequencing of whole genomic DNA provides first evidence of DNA transfer into an angiosperm plastid genome BMC Plant Biology 2012, 12:61

  1. Tongwu Zhang, Xiaowei Zhang, Songnian Hu and Jun Yu An efficient procedure for plant organellar genome assembly, based on whole genome data from the 454 GS FLX sequencing platform Plant Methods 2011, 7:38 doi:10.1186/1746-4811-7-38 Additional file 1

  2. Fajardo et al. Complete plastid genome sequence of Vaccinium macrocarpon: structure, gene content, and rearrangements revealed by next generation sequencing Tree Genetics & Genomes April 2013, Volume 9, Issue 2, pp 489-498

  3. Chang S, Wang Y, Lu J, Gai J, Li J, et al. (2013) The Mitochondrial Genome of Soybean Reveals Complex Genome Structures and Gene Evolution at Intercellular and Phylogenetic Levels. PLoS ONE 8(2): e56502. doi:10.1371/journal.pone.0056502

  4. Shearman et al. Assembly and analysis of a male sterile rubber tree mitochondrial genome reveals DNA rearrangement events and a novel transcript BMC Plant Biology 2014, 14:45 doi:10.1186/1471-2229-14-45


This program generates coverage plots from various types of input. Fasta, fastq, sff, bam, or bed file input can be used.


This program collects and parses output from a Shimadzu ELSD-LT, which is an Evaporative Light-scattering Detector, Low Temperature, (original version circa 2002+) as collected from its RS232 serial port


This program takes an aligned multi-FASTA file as input, and generates a consensus sequence based on the most abundant nucleotide.


Search and return sequences from FASTA or FASTQ files based on matches to one or more queries to text in the header line. Also can do search and replace on headers.


This program will allow changing the order or orientation of multiple sequences in FASTA format, or extraction of a subset of sequences. The resulting sequences can optionally be concatenated into a single sequence.

Tongwu Zhang, Xiaowei Zhang, Songnian Hu and Jun Yu An efficient procedure for plant organellar genome assembly, based on whole genome data from the 454 GS FLX sequencing platform Plant Methods 2011, 7:38 doi:10.1186/1746-4811-7-38 Additional file 1


Implements gap coding of an aligned multi-fasta file as described in BMC Bioinformatics 2003 4:6 "GapCoder automates the use of indel characters in phylogenetic analysis" This program generates a tab-delimited matrix output format


Sort a gff3 file maintaining child features with the parent feature. Sort alphabetically by sequence name, then numerically by start and then by end coordinate.


Marina Iovene, Pablo F. Cavagnaro, Douglas Senalik, C. Robin Buell, Jiming Jiang and Philipp W. Simon Comparative FISH mapping of Daucus species (Apiaceae family)

Chromosome Research Volume 19, Number 4, 493-506 DOI: 10.1007/s10577-011-9202-y

This is a Perl program that will computationally detect open reading frames in DNA or RNA sequences in FASTA format. This is computationally similar to the NCBI program at, but allows command-line automation of the process, as well as a few additional features.

A very simple pipe to reverse complement a raw sequence stream e.g. echo "ACTG" | bb.revcomp outputs "CAGT", or it can also handle a stream in fasta format


This program will return all portions of a final assembly consisting of contiguous sequence, with sequences split at every occurrence of gaps of unknown bases (Ns)


This program analyzes a bam file of mapped reads to detect regions where a significant fraction of the reads are consistently softclipped. These can be due to differences from the reference sequence or misassembly in the reference sequence. An example of a misassembly visualized in IGV, with the "Show soft-clipped bases" option enabled. misassembly example image for bb.softclip


This program will preprocess paired-end Illumina reads from GBS (Genotyping By Sequencing) experiments to make them compatible with TASSEL. This involves copying the barcode from the forward reads to the beginning of the reverse reads, since TASSEL cannot otherwise identify the reverse reads which do not have a barcode.


This program acts as a pipe, and will insert time stamps in the stream, useful for monitoring long-running programs that do not themselves output timestamps


A wrapper around code originally from a script from 2009-2010 Victor Strelets,

This program creates TopoView tracks for GBrowse from .wig or .bed files

To install prerequisites: sudo apt-get install libdb-dev sudo cpan BerkeleyDB


