Kractor

kraken extractor

Kractor extracts sequencing reads based on taxonomic classifications obtained via Kraken2. It consumes paired or unpaired fastq[.gz/.bz] files as input alongisde a Kraken2 standard output. It can optionally consume a Kraken2 report to extract all taxonomic parents and children of a given taxid. Fast by default, it outputs fast[q/a] files, that can optionally be compressed.

Kractor significantly enhances processing speed compared to KrakenTools for both paired and unpaired reads. Paired reads are processed approximately 21x quicker for compressed fastqs and 10x quicker for uncompressed. Unpaired reads are approximately 4x faster for both compressed and uncompressed inputs.

For additional details, refer to the benchmarks

Motivation

Heavily inspired by the great KrakenTools.

The main motivation was to enchance speed when parsing and extracting (writing) a large volume of reads - and also to learn rust.

Installation

Binaries:

Precompiled binaries for Linux, MacOS and Windows are attached to the latest release 0.4.0

Docker:

A docker image is available on Docker Hub

docker pull samsims/kractor
docker run samsims/kractor --help

Use -v to mount your input and output directories. A typical command might look like:

docker run -v /path/to/input:/input -v /path/to/output:/output samsims/kractor -k /input/<kraken_output> -i /input/<fastq_file> -t <taxonomic_id> -o /output/<output_fastq>

Cargo:

Requires cargo

cargo install kractor

Build from source:

Install rust toolchain:

To install please refer to the rust documentation: docs

Clone the repository:

git clone https://github.com/Sam-Sims/Kractor

Build and add to path:

cd Kractor
cargo build --release
export PATH=$PATH:$(pwd)/target/release

All executables will be in the directory Kractor/target/release.

Usage

Basic Usage:

kractor -k <kraken_output> -i <fastq_file> -t <taxonomic_id> -o <output_file> > kractor_report.json

Or, if you have paired-end illumina reads:

kractor -k <kraken_output> -i <R1_fastq_file> -i <R2_fastq_file> -t <taxonomic_id> -o <R1_output_file> -o <R2_output_file>

If you want to extract all children of a taxon:

kractor -k <kraken_output> -r <kraken_report> -i <fastq_file> -t <taxonomic_id> --children -o <output_file>

Arguments:

Required:

Input

-i, --input

This option will specify the input files containing the reads you want to extract from. They can be compressed - (gz, bz2). Paired end reads can be specified by:

Using --input twice: -i <R1_fastq_file> -i <R2_fastq_file>

Using --input once but passing both files: -i <R1_fastq_file> <R2_fastq_file>

This means that bash wildcard expansion works: -i *.fastq

Output

-o, --output

This option will specify the output files containing the extracted reads. The order of the output files is assumed to be the same as the input.

By default the compression will be inferred from the output file extension for supported file types (gz, bz). If the output type cannot be inferred, plaintext will be output.

Kraken Output

-k, --kraken

This option will specify the path to the Kraken2 output containing taxonomic classification of read IDs.

Taxid

-t, --taxid

This option will specify the taxon ID for reads you want to extract.

Optional:

Output type

-O, --output-type

This option will manually set the compression mode used for the output file and will override the type inferred from the output path.

Valid values are:

gz to output gz
bz2 to output bz2
none to not apply compresison

Compression level

-l, --level

This option will set the compression level to use if compressing the output. Should be a value between 1-9 with 1 being the fastest but largest file size and 9 is for slowest, but best file size. By default this is set at 2 as it is a good trade off for speed/filesize.

Output fasta

--output-fasta

This option will output a fasta file, with read ids as headers.

Kraken Report

-r, --report

This option specifies the path to the report file generated by Kraken2. If you want to use --parents or --children then is argument is required.

Parents

--parents

This will extract reads classified at all taxons between the root and the specified --taxid.

Children

--children

This will extract all the reads classified as decendents or subtaxa of --taxid (Including the taxid).

Exclude

--exclude

This will output every read except those matching the taxid. Works with --parents and --children

Skip report

--no-json

This will skip the json report that is output to stdout upon programme completion.

Future plans

Version

0.4.0

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
changelog.md		changelog.md
screenshot.png		screenshot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kractor

Motivation

Installation

Binaries:

Docker:

Cargo:

Build from source:

Install rust toolchain:

Clone the repository:

Build and add to path:

Usage

Basic Usage:

Arguments:

Required:

Input

Output

Kraken Output

Taxid

Optional:

Output type

Compression level

Output fasta

Kraken Report

Parents

Children

Exclude

Skip report

Future plans

Version

About

Releases 3

Packages

Languages

License

Sam-Sims/kractor

Folders and files

Latest commit

History

Repository files navigation

Kractor

Motivation

Installation

Binaries:

Docker:

Cargo:

Build from source:

Install rust toolchain:

Clone the repository:

Build and add to path:

Usage

Basic Usage:

Arguments:

Required:

Input

Output

Kraken Output

Taxid

Optional:

Output type

Compression level

Output fasta

Kraken Report

Parents

Children

Exclude

Skip report

Future plans

Version

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages