-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
> no suitable ctc data to write
when basecalling with save-ctc
flag
#308
Comments
Dear @HanielF and @iiSeymour We appear to be in a similar position. Several millions of reads as input, but only tens of thousands of reads appear to be used. My expectation was that much more of the raw data could be used. Any suggestions on how to overcome this? |
Hi @iiSeymour, I have same issue. I have Pod5 which containes multiplexed reads.
The basecalling stop after few seconds and doesn't seems to work, I got next result :
My reference sequence contain around 400 bases, and the '.mmi' file was generated using ">minimap2 -x map-ont -d ./Fasta/myRef.mmi ./Fasta/myRef.fasta" command. I do not understand why I cannot perform the basecallinf using --save-ctc flag. Note that the basecalling performs well when not using this flag. thanks in advance for your answer |
I have the same issue. I tried two datasets of different quality, but even with the slightly higher-quality dataset, the output .bam file was still empty after waiting for 4 hours. I carefully reviewed the basecaller.py file and found that the process of generating labels involves segmenting the electrical signals based on the chunksize, performing basecalling for each chunk, and finally saving the results based on the score. I'm not sure what went wrong during this process. |
Hi~ I am trying to train a bonito model from scratch.
To obtain my training data, i basecalled the reads with the command as below:
It will raise an error,
> no suitable ctc data to write
, inbonito.io.CTCWriter.run()
. It means that none of the reads can pass the checks.To improve the accuracy, I replaced the
fast
model with[email protected]
. Several hundred reads passed the checks this time.Also, it's weird that only 413 reads were saved to CTC data, while a total of 100960 reads were input.
I checked the code in
bonito.io.CTCWriter
and found that most of the reads are filtered byself.min_accuracy
, which is set to0.99
by default.Here is the statistical results for ctc-data:
Obviously, the CTC training data is not enough even though that is only one of 50 genomes for the whole training set.
Do I need to lower the value of the
min_accuracy
parameter?The text was updated successfully, but these errors were encountered: