Skip to content

Releases: nanoporetech/bonito

v0.5.0

01 Dec 20:02
Compare
Choose a tag to compare

Highlights

  • Modified basecalling via Remora.
  • Aligned/unaligned SAM/BAM/CRAM output support with read groups (draft spec).
  • Fast/HAC/SUP models for R9.4.1 E8, R9.4.1 E8.1 and R10.4 E8.1.
  • Model performance for SUP & HAC models are now inline with Guppy.
  • Fully calibrated qstring/qscores for all models.
  • Automatic model downloading.

Modified Basecalling

Methylation/modified base calling can now be enabled with a single flag --modified-bases.

$ bonito basecaller [email protected] reads --modified-bases 5mC --ref ref.mmi | samtools sort -o out.bam -
$ samtools index out.bam
$ modbam2bed -a 0.2 -b 0.8 --cpg -r chr20 -m 5mC -e ref.fa out.bam > results_5mC.bed

Models

All model identifiers include the model version, ambiguous unversioned models are no longer provided.

Condition Fast High Accuracy Super Accuracy
R9.4.1 E8 [email protected] [email protected] [email protected]
R9.4.1 E8.1 [email protected] [email protected] [email protected]
R10.4 E8.1 [email protected] [email protected] [email protected]

Available models can be listed with bonito download --models --list.

v3.4 models are newly released whereas v3.3 models have been available previously, however, all models have newly tuned configs. Fast models are now higher accuracy 128 wide models.

Models configs have been tuned for performance and the batch sizes have been selected to use approximately 11GB of GPU memory. If you have a GPU with less than this please reduce the batch size with --batchsize when base calling.

Misc

  • CUDA 11.3 builds added.
  • Updated dependency highlights: pytorch==1.10, mappy=2.23.
  • Duplex calling superseded by significantly higher performance inplmention in Guppy 6.0.
  • Basecaller default parameters can now be set in the model config.toml under the [basecaller] section
  • Command line parameters will now override config.toml settings.
  • SAM tags included when output .fastq (SAM/BAM/CRAM is recommended however).

Full Changelog: v0.4.0...v0.5.0

v0.4.0

20 May 16:07
Compare
Choose a tag to compare

Duplex calling

Bonito duplex calling for crf-ctc models. This method takes template & complement pairs to produce higher quality calls.

$ bonito duplex dna_r9.4.1 /data/reads --pairs pairs.txt --reference ref.mmi > basecalls.sam

The pairs.csv file is expected to contain pairs of read-ids per line (separated by a single space).

Follow-on reads can also be automatically paired if an alignment summary file is provided instead of a pairs.csv.

$ bonito duplex dna_r9.4.1 /data/reads --summary alignment_summary.txt --reference ref.mmi > basecalls

The duplex caller replaces the older bonito pair interface and builds on the work from @jordisr and @ihh 1.

  1. Silvestre-Ryan, J., Holmes, I. Pair consensus decoding improves accuracy of neural network basecallers for nanopore sequencing. Genome Biol 22, 38 (2021). https://doi.org/10.1186/s13059-020-02255-1

v0.3.8

22 Apr 00:09
Compare
Choose a tag to compare
  • 60ac4a7 add scipy to the requirements (hotfix).

v0.3.7

21 Apr 09:09
Compare
Choose a tag to compare

Improved R9.4.1 ([email protected]) and R10.3 ([email protected]) models for existing chemistries.

  • 9abea07 CRF models move to using a single blank score.
  • 3def77d non-uniform first chunk strategy instead of zero-padding.
  • 3342d1b Ability to call the reverse complement sequence.
  • 651b9c1 bonito export for converting models to guppy.
  • 9ac04dd upgrade to PyTorch 1.8.
  • 9739d6b read trimming.
  • ae221ca handle strings and bytes from h5py.
  • 6349f8d batch summary write to avoid performance degradation with slower output locations (i.e. NFS).

Note: with the upgrade to PyTorch 1.8 this release drops support for Python 3.5.

v0.3.6

24 Feb 10:57
Compare
Choose a tag to compare

New R9.4.1 and R10.3 models for existing chemistries that improve on barcode classification, long deletions, and cleans up erroneous repeat runs to start calls with R10.3.

The new models are downloaded when bonito v0.3.6 is installed. To manually update dna_r9.4.1 and dna_r10.3 to point to the latest models run bonito download --models --latest -f or to download all models run bonito download --models -f.

$ bonito download --models -f
[downloading models]
[downloaded [email protected]]                                                                      
[downloaded [email protected]]                                                                      
[downloaded [email protected]]                                                                      
[downloaded [email protected]]                                                                    
[downloaded [email protected]]                                                                    
[downloaded dna_r9.4.1.zip]                                                                         
[downloaded dna_r10.3_q20ea.zip]                                                                    
[downloaded [email protected]]                                                                       
[downloaded [email protected]]                                                                     
[downloaded dna_r10.3.zip]  

The new model versions are [email protected] and [email protected].

v0.3.5

01 Feb 15:48
Compare
Choose a tag to compare
  • 06b1040 added dna_r10.3_q20ea model.
  • 3615abc interupting basecalling with ctrl+c will now exit gracefully.
  • 82dc81d basecalling performance inpoved upto 35% by overlapping DtoH memcopies with model inference [@sirelkhatim @EpiSlim]
  • 7796927 fix growing memory usage when basecalling.
  • a11be1c add --max-reads arguments to basecaller.
  • b1713d4 use a fixed validation directory when training is present.
  • 6ceed73 apply optional indices.npy when loading training data.
  • e2cd5cf remove preshuffle when loading training data.
  • 1ec1d2b ability to finetune a pretrained model on new data
    • bonito training --epochs 1 --lr 5e-4 --pretrained dna_r9.4.1 --directory new-training-data fine-tuned-model

v0.3.2

02 Dec 19:55
Compare
Choose a tag to compare
  • a7be8a0 R10.3 model.
  • c187c84 uncap the number of training examples by default.
  • 4ec933a stop storing/loading the optimizer state.
  • b39e191 record a more fine-grained loss log.

v0.3.1

15 Nov 15:16
Compare
Choose a tag to compare
  • 282bf19 new fine-tuned @v3.1 model.
  • 5810be8 medaka model + zymo results.
  • 2c51ec5 recursive directory searching.
  • a39bf52 produce a valid fastq file with CRF models (uses a default qstring value).
  • c1ae23b fix the --save-ctc pipeline for @v2 models.

v0.3.0

28 Oct 09:38
Compare
Choose a tag to compare
  • Higher accuracy Bonito CRF models (+ a lot faster to train).
  • Multiproc Fast5 producer for faster ingest.
  • Include MD tags when writing SAM.
  • Numerous bugfixes and improvements.

Note: bonito now indirectly depends on CUDA 10.2

v0.2.3

04 Sep 11:17
Compare
Choose a tag to compare

Changes

  • 0b7ad17 add --save-ctc to bonito basecaller to save calls in the training format.
  • 82907bd automatically detect if FP16 is supported and use by default if so.
  • 9746bcc set the default number of training epochs to 20.