Purpose

splitaake is a reasonably easy method for demultiplexing Illumina reads using either Hamming distance or Levenshtein (edit) distance sequence tags. A the moment, Hamming distance is the default, which is similar to the approach Illumina uses in their own software. splitaake differs from the Illumina software in that it can demultiplex sequence tags of many lengths, rather than just the "standard" TruSeq index length of 6 nucleotides.

Notes

splitaake is under development. This means that it may break, be a pain to get running, etc. I'll improve as I have time. Please feel free to suggest contributions/changes/additions.

Design

splitaake is currently designed as a single-core application meaning that it does not parallelize the process of demultiplexing. After a number of tests, I've found that for most files, a single core approach is reasonably fast (and sometimes faster) than multi-core options, particularly when you wish to work with gzipped fastq files.

I'm still testing additional ways of demultiplexing data in parallel. Hopefully more on this front soon...

Dependencies

seqtools ("working" branch):

pip install git+git://github.com/faircloth-lab/seqtools.git@working

jellyfish (at the moment - quite fast Hamming implementation):
```
pip install git+git://github.com/sunlightlabs/jellyfish
```

Running

generate a config file mapping indexes to filenames. This file is named map.conf, as used below:

TruSeq1:ATCACGATCT
TruSeq2:CGATGTATCT
TruSeq3:TTAGGCATCT
TruSeq4:TGACCAATCT
TruSeq5:ACAGTGATCT
TruSeq6:GCCAATATCT
TruSeq7:CAGATCATCT
TruSeq8:ACTTGAATCT
TruSeq9:GATCAGATCT
TruSeq10:TAGCTTATCT
TruSeq11:GGCTACATCT
TruSeq12:CTTGTAATCT

run splitaake:

python splitaake.py L007_R1.fastq.gz L007_R2.fastq.gz L007_R3.fastq.gz map.conf --section taxa

this will identify your reads and create a directory dmux containing your reads in interleaved, fastq, gzip files, like so:
```
dmux/
    TruSeq1.fastq.gz
    TruSeq2.fastq.gz
    TruSeq3.fastq.gz
    ...
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.rst

README.rst

Purpose

Notes

Design

Dependencies

Running

Files

README.rst

Latest commit

History

README.rst

File metadata and controls

Purpose

Notes

Design

Dependencies

Running