You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working on training a RNA specific basecaller model. To that end, I have been attempting to use the RNA004 basecaller for training. However, this model does not seem to be outputting the CTC data correctly. No matter what data I put in, the resulting chunks.npy is always 0 by 10000. To make sure it was not my data, I fed the RNA data through the DNA r_10 basecalling model and got a 59000 by 9996 numpy array. Furthermore, all outputs from the RNA004 basecaller model are sub 1 kb of storage, which I believe are just empty files. In addition, the model even says "saving CTC data" in the console (just to prove that the data isn't the problem) when using the RNA004 model. I believe this is a bug, as the RNA004 model does not throw any errors, it just does not save any data correctly. I am very confused as to how to proceed, as I need the RNA CTC data to train my specific basecalling model.
Let me know if I can provide any more information to help diagnose/solve this problem!
The text was updated successfully, but these errors were encountered:
Only high quality chunks (>99% accuracy by default) are saved for training. You will want to change this filter with --min-accuracy-save-ctc to be in line with the distribution on your RNA calls.
I had read about this issue in other github issues. We tried setting the --min-accuracy-save-ctc flag to 15, 1, and 0.2. No data was every written to chunks.npy. Our data, in terms of quality typically has an average quality score of 14. I don't totally understand how you judge what is a high quality chunk or not.
Hello!
I am working on training a RNA specific basecaller model. To that end, I have been attempting to use the RNA004 basecaller for training. However, this model does not seem to be outputting the CTC data correctly. No matter what data I put in, the resulting chunks.npy is always 0 by 10000. To make sure it was not my data, I fed the RNA data through the DNA r_10 basecalling model and got a 59000 by 9996 numpy array. Furthermore, all outputs from the RNA004 basecaller model are sub 1 kb of storage, which I believe are just empty files. In addition, the model even says "saving CTC data" in the console (just to prove that the data isn't the problem) when using the RNA004 model. I believe this is a bug, as the RNA004 model does not throw any errors, it just does not save any data correctly. I am very confused as to how to proceed, as I need the RNA CTC data to train my specific basecalling model.
Let me know if I can provide any more information to help diagnose/solve this problem!
The text was updated successfully, but these errors were encountered: