Releases: nanoporetech/bonito
v0.5.0
Highlights
- Modified basecalling via Remora.
- Aligned/unaligned SAM/BAM/CRAM output support with read groups (draft spec).
- Fast/HAC/SUP models for R9.4.1 E8, R9.4.1 E8.1 and R10.4 E8.1.
- Model performance for SUP & HAC models are now inline with Guppy.
- Fully calibrated qstring/qscores for all models.
- Automatic model downloading.
Modified Basecalling
Methylation/modified base calling can now be enabled with a single flag --modified-bases
.
$ bonito basecaller [email protected] reads --modified-bases 5mC --ref ref.mmi | samtools sort -o out.bam -
$ samtools index out.bam
$ modbam2bed -a 0.2 -b 0.8 --cpg -r chr20 -m 5mC -e ref.fa out.bam > results_5mC.bed
Models
All model identifiers include the model version, ambiguous unversioned models are no longer provided.
Condition | Fast | High Accuracy | Super Accuracy |
---|---|---|---|
R9.4.1 E8 | [email protected] |
[email protected] |
[email protected] |
R9.4.1 E8.1 | [email protected] |
[email protected] |
[email protected] |
R10.4 E8.1 | [email protected] |
[email protected] |
[email protected] |
Available models can be listed with bonito download --models --list
.
v3.4 models are newly released whereas v3.3 models have been available previously, however, all models have newly tuned configs. Fast models are now higher accuracy 128 wide models.
Models configs have been tuned for performance and the batch sizes have been selected to use approximately 11GB of GPU memory. If you have a GPU with less than this please reduce the batch size with --batchsize
when base calling.
Misc
- CUDA 11.3 builds added.
- Updated dependency highlights:
pytorch==1.10
,mappy=2.23
. - Duplex calling superseded by significantly higher performance inplmention in Guppy 6.0.
- Basecaller default parameters can now be set in the model
config.toml
under the[basecaller]
section - Command line parameters will now override
config.toml
settings. - SAM tags included when output
.fastq
(SAM/BAM/CRAM is recommended however).
Full Changelog: v0.4.0...v0.5.0
v0.4.0
Duplex calling
Bonito duplex calling for crf-ctc models. This method takes template & complement pairs to produce higher quality calls.
$ bonito duplex dna_r9.4.1 /data/reads --pairs pairs.txt --reference ref.mmi > basecalls.sam
The pairs.csv
file is expected to contain pairs of read-ids per line (separated by a single space).
Follow-on reads can also be automatically paired if an alignment summary file is provided instead of a pairs.csv
.
$ bonito duplex dna_r9.4.1 /data/reads --summary alignment_summary.txt --reference ref.mmi > basecalls
The duplex caller replaces the older bonito pair
interface and builds on the work from @jordisr and @ihh 1.
- Silvestre-Ryan, J., Holmes, I. Pair consensus decoding improves accuracy of neural network basecallers for nanopore sequencing. Genome Biol 22, 38 (2021). https://doi.org/10.1186/s13059-020-02255-1
v0.3.8
v0.3.7
Improved R9.4.1 ([email protected])
and R10.3 ([email protected])
models for existing chemistries.
- 9abea07 CRF models move to using a single blank score.
- 3def77d non-uniform first chunk strategy instead of zero-padding.
- 3342d1b Ability to call the reverse complement sequence.
- 651b9c1 bonito export for converting models to guppy.
- 9ac04dd upgrade to PyTorch 1.8.
- 9739d6b read trimming.
- ae221ca handle strings and bytes from h5py.
- 6349f8d batch summary write to avoid performance degradation with slower output locations (i.e. NFS).
Note: with the upgrade to PyTorch 1.8 this release drops support for Python 3.5.
v0.3.6
New R9.4.1 and R10.3 models for existing chemistries that improve on barcode classification, long deletions, and cleans up erroneous repeat runs to start calls with R10.3.
The new models are downloaded when bonito v0.3.6
is installed. To manually update dna_r9.4.1
and dna_r10.3
to point to the latest models run bonito download --models --latest -f
or to download all models run bonito download --models -f
.
$ bonito download --models -f
[downloading models]
[downloaded [email protected]]
[downloaded [email protected]]
[downloaded [email protected]]
[downloaded [email protected]]
[downloaded [email protected]]
[downloaded dna_r9.4.1.zip]
[downloaded dna_r10.3_q20ea.zip]
[downloaded [email protected]]
[downloaded [email protected]]
[downloaded dna_r10.3.zip]
The new model versions are [email protected]
and [email protected]
.
v0.3.5
- 06b1040 added
dna_r10.3_q20ea
model. - 3615abc interupting basecalling with
ctrl+c
will now exit gracefully. - 82dc81d basecalling performance inpoved upto 35% by overlapping DtoH memcopies with model inference [@sirelkhatim @EpiSlim]
- 7796927 fix growing memory usage when basecalling.
- a11be1c add
--max-reads
arguments to basecaller. - b1713d4 use a fixed validation directory when training is present.
- 6ceed73 apply optional
indices.npy
when loading training data. - e2cd5cf remove preshuffle when loading training data.
- 1ec1d2b ability to finetune a pretrained model on new data
bonito training --epochs 1 --lr 5e-4 --pretrained dna_r9.4.1 --directory new-training-data fine-tuned-model