Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Just forgot to pull first
  • Loading branch information
Guillaume Holley committed Jan 25, 2022
2 parents 40241bc + 32e9a1f commit 33fe3fb
Show file tree
Hide file tree
Showing 2 changed files with 59 additions and 2 deletions.
9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,14 @@

### Hybrid error correction of long reads using colored de Bruijn graphs

Ratatosk is a *de novo* error correction tool for erroneous long reads designed for accurate variant calling and assembly. It is based on a compacted and colored de Bruijn graph built from accurate short reads. Reads color paths in the graph while vertices are annotated with candidate *de novo* SNPs and short repeats. We demonstrate that Ratatosk can reduce the raw error rate of ONT reads several fold on average with a mean error rate as low as 1.4%. Variant calling on Ratatosk corrected data shows 99.8% and 79.9% accuracy for SNP and indels respectively. An assembly of the Ashkenazi individual HG002 created from Ratatosk corrected ONT reads yields a contig N50 of 39.7 Mbp and a quality value of 48.5.
Ratatosk is a *de novo* error correction tool for erroneous long reads designed for accurate variant calling and assembly. It is based on a compacted and colored de Bruijn graph built from accurate short reads. Reads color paths in the graph while vertices are annotated with candidate *de novo* SNPs and short repeats. We demonstrate that Ratatosk can reduce the raw error rate of ONT reads several fold on average with a mean error rate as low as 1.4%. Variant calling on Ratatosk corrected data shows 99.91% and 95.88% F1 for SNP and indels respectively. An assembly of the Ashkenazi individual HG002 created from Ratatosk corrected ONT reads yields a contig N50 of 39.7 Mbp and a quality value of 48.5.

## Table of Contents

* [Requirements](#requirements)
* [Installation](#installation)
* [Usage](#usage)
* [Variant calling](#variant-calling)
* [Interface](#interface)
* [FAQ](#faq)
* [Troubleshooting](#troubleshooting)
Expand Down Expand Up @@ -122,6 +123,10 @@ cmake -DMAX_KMER_SIZE=96 ..
```
In this example, the maximum *k1*/*k2*-mer length allowed is 95.

## Variant calling

See [Variant calling](variant_calling.md) to call SNP and indels from Ratatosk-corrected long reads.

## Interface

```
Expand Down Expand Up @@ -350,7 +355,7 @@ For any question, feedback or problem, please feel free to file an issue on this

## License

* The xxHash library is BSD licensed (https://github.com/Cyan4973/xxHash)
* The wyhash library is Unlicense licensed (https://github.com/wangyi-fudan/wyhash)
* The popcount library is BSD licensed (https://github.com/kimwalisch/libpopcnt)
* The libdivide library is zlib licensed (https://github.com/ridiculousfish/libdivide)
* The kseq library is copyrighted by Heng Li and released under the MIT license (http://lh3lh3.users.sourceforge.net/kseq.shtml)
Expand Down
52 changes: 52 additions & 0 deletions variant_calling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Variant calling

Variant calling on ONT reads Illumina-corrected with Ratatosk can be performed with the Pepper-MARGIN-DeepVariant pipeline. Pepper-MARGIN-DeepVariant models are now provided for ONT reads from R9.4 flowcells, basecalled with Guppy 5+ SUP model and corrected with Ratatosk 0.7.5. Using these models on reads from a different type of flowcell or basecalled with a different version of Guppy will work but the call accuracy will be suboptimal.

## Requirements

* [Pepper-MARGIN-DeepVariant](https://github.com/kishwarshafin/pepper)
* [Pepper-MARGIN-DeepVariant models for Ratatosk](https://drive.google.com/file/d/1AbkKIGY19xbnvVI6PUF_R4YhVOLeXiZw/view?usp=sharing)

## Installation

Decompress the Pepper-MARGIN-DeepVariant models:
```
tar -xvzf R9_GUPPY_SUP_MODELS.tar.gz
```

The output should be 5 files:
```bash
ls -lh
# R9_GUPPY_SUP_DEEPVARIANT.data-00000-of-00001
# R9_GUPPY_SUP_DEEPVARIANT.index
# R9_GUPPY_SUP_DEEPVARIANT.meta
# R9_GUPPY_SUP_PEPPER_HP.pkl
# R9_GUPPY_SUP_PEPPER_SNP.pkl
```

## Input

* Corrected long reads mapping (BAM file)
* Pepper-MARGIN-DeepVariant models for Ratatosk

The corrected long reads must be mapped to a reference genome. We recommend [Winnowmap2](https://github.com/marbl/Winnowmap) with the `-x map-pb` preset or [Minimap2](https://github.com/lh3/minimap2) with the `-x map-hifi` preset for the mapping. The BAM file must be sorted (`samtools sort`) and indexed (`samtools index`).

## Usage

```bash
singularity exec --bind /usr/lib/locale/ \
pepper_deepvariant_r0.7.sif \
run_pepper_margin_deepvariant call_variant \
-b "${INPUT_DIR}/${BAM}" \
-f "${INPUT_DIR}/${REF}" \
-o "${OUTPUT_DIR}" \
-p "${OUTPUT_PREFIX}" \
-s "${SAMPLE}"
-t ${THREADS} \
--ont_r9_guppy5_sup \
--dv_model "R9_GUPPY_SUP_DEEPVARIANT" \
--pepper_model R9_GUPPY_SUP_PEPPER_SNP.pkl \
--pepper_hp_model R9_GUPPY_SUP_PEPPER_HP.pkl \
```

The Docker command line should be similar to the Singularity one. See [Pepper-MARGIN-DeepVariant](https://github.com/kishwarshafin/pepper) for more information.

0 comments on commit 33fe3fb

Please sign in to comment.