Wiki on experimental strategies for training models with extended alphabet #242

Kirk3gaard · 2022-03-18T12:56:43Z

Hi

It would be cool if there was a wiki section that would include the entire approach (including the wetlab part) of training new models. e.g. what would be the best experimental design to train models for predicting the incorporation of alternative nucleotides?

Sequence PCR products with one nucleotide fully substituted
Sequence PCR products with "normal" nucleotides
Sequence PCR products with a mix of normal and substituted?

Best regards
Rasmus

mauriciolp · 2022-04-06T07:00:31Z

Hey Rasmus,

I am also currently facing these questions.
I guess that it would be useful and nice from ONT side to give some tips about this.
Although I think that for the wetlab part you might need to refer to what have been published, for example:

Kimoto, Michiko, Si Hui Gabriella Soh, and Ichiro Hirao. 2020. “Sanger Gap Sequencing for Genetic Alphabet Expansion of DNA.” Chembiochem: A European Journal of Chemical Biology 21 (16): 2287–96.
Yamashige, Rie, Michiko Kimoto, Yusuke Takezawa, Akira Sato, Tsuneo Mitsui, Shigeyuki Yokoyama, and Ichiro Hirao. 2012. “Highly Specific Unnatural Base Pair Systems as a Third Base Pair for PCR Amplification.” Nucleic Acids Research 40 (6): 2793–2806.

On my case I have the sequence PCR data from a DNA sample with extended alphabet, and I am tweaking Bonito to train with this data.
I have found that some adjustments in the code were necessary to make it work.
Hopefully I can share more about it once this work progresses.

mauriciolp · 2024-12-13T07:56:28Z

Took me sometime working on this, but I just uploaded a paper about it on bioRxiv, and created a repository for it here.

In our work we show how to achieve high-throughput sequencing of DNA containing Unnatural Bases (UBs), a.k.a Non-Canonical Bases (NCBs), using Nanopore and de novo basecalling enabled by spliced-based data-augmentation. The code here contains a basecaller architecture modified for learning to also basecall 1 or 2 additional UBs, and includes real-time data-augmentation for generating train data with UBs in all possible sequencing contexts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wiki on experimental strategies for training models with extended alphabet #242

Wiki on experimental strategies for training models with extended alphabet #242

Kirk3gaard commented Mar 18, 2022

mauriciolp commented Apr 6, 2022

mauriciolp commented Dec 13, 2024

Wiki on experimental strategies for training models with extended alphabet #242

Wiki on experimental strategies for training models with extended alphabet #242

Comments

Kirk3gaard commented Mar 18, 2022

mauriciolp commented Apr 6, 2022

mauriciolp commented Dec 13, 2024