RNA004 Does not output any CTC data #379

VBHarrisN · 2024-02-05T15:54:26Z

Hello!

I am working on training a RNA specific basecaller model. To that end, I have been attempting to use the RNA004 basecaller for training. However, this model does not seem to be outputting the CTC data correctly. No matter what data I put in, the resulting chunks.npy is always 0 by 10000. To make sure it was not my data, I fed the RNA data through the DNA r_10 basecalling model and got a 59000 by 9996 numpy array. Furthermore, all outputs from the RNA004 basecaller model are sub 1 kb of storage, which I believe are just empty files. In addition, the model even says "saving CTC data" in the console (just to prove that the data isn't the problem) when using the RNA004 model. I believe this is a bug, as the RNA004 model does not throw any errors, it just does not save any data correctly. I am very confused as to how to proceed, as I need the RNA CTC data to train my specific basecalling model.

Let me know if I can provide any more information to help diagnose/solve this problem!

iiSeymour · 2024-04-04T00:19:27Z

Only high quality chunks (>99% accuracy by default) are saved for training. You will want to change this filter with --min-accuracy-save-ctc to be in line with the distribution on your RNA calls.

https://github.com/nanoporetech/bonito/blob/master/bonito/cli/basecaller.py#L211

VBHarrisN · 2024-04-04T14:21:46Z

I had read about this issue in other github issues. We tried setting the --min-accuracy-save-ctc flag to 15, 1, and 0.2. No data was every written to chunks.npy. Our data, in terms of quality typically has an average quality score of 14. I don't totally understand how you judge what is a high quality chunk or not.

Sgreenfield9 · 2024-04-04T15:42:57Z

We're in the same boat. I've actually dropped my --min-accuracy-save-ctc flag down to 0 but still nothing.

lkwhite mentioned this issue Apr 1, 2024

Bonito Train missing dataset.py #377

Closed

iiSeymour added the question Further information is requested label Apr 4, 2024

iiSeymour self-assigned this Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RNA004 Does not output any CTC data #379

RNA004 Does not output any CTC data #379

VBHarrisN commented Feb 5, 2024

iiSeymour commented Apr 4, 2024

VBHarrisN commented Apr 4, 2024

Sgreenfield9 commented Apr 4, 2024

RNA004 Does not output any CTC data #379

RNA004 Does not output any CTC data #379

Comments

VBHarrisN commented Feb 5, 2024

iiSeymour commented Apr 4, 2024

VBHarrisN commented Apr 4, 2024

Sgreenfield9 commented Apr 4, 2024