You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am having some issues relating to demultiplexing. I have trained a custom model, it performs extremely well on the DNA of my species of interest, but falls over when it comes to demultiplexing. I am losing more than half of my data to the dreaded "none" bin.
In #26 (comment) it was suggested that trimming the signal could improve this - and this makes a lot of sense. However, the example there assumes trimming in the process of chunkifying a HDF5 file from taiyaki.
I have the chunk data already (from basecalling with --save-ctc) and would like to trim this to achieve the same result as trimming the signal at the starts and the ends by some offset. (I basically want to get rid of signal that relates to the barcode.)
What I am struggling with is how best to do this as I don't know what each of the chunkify files is.
For example, the reference_lengths.npy file has shape (35691,), references.npy has shape (35691, 482), and chunks.npy has shape (35691, 4000). How do each of these files relate to each other?
Let's say I want to trim 100 signal samples from the start and end of each read, how would I do this? (I am open to suggestions for offset sizes - this was an arbitrary number).
The text was updated successfully, but these errors were encountered:
Sadly, no @touala. I never got any response about this issue. I ended up having to abandon my project because of this issue. I tried many different ways to trim the data but couldn't fix this demultiplexing issue unfortunately.
Thanks for the response @mbhall88. I'm currently doing the demultiplexing with ONT model and then using my custom model to redo the basecalling... Not great but it seems ok. I'll revisit soon as I need to update all my workflow. Hopefully this got better since last time I tried.
Hi,
I am having some issues relating to demultiplexing. I have trained a custom model, it performs extremely well on the DNA of my species of interest, but falls over when it comes to demultiplexing. I am losing more than half of my data to the dreaded "none" bin.
In #26 (comment) it was suggested that trimming the signal could improve this - and this makes a lot of sense. However, the example there assumes trimming in the process of chunkifying a HDF5 file from taiyaki.
I have the chunk data already (from basecalling with
--save-ctc
) and would like to trim this to achieve the same result as trimming the signal at the starts and the ends by some offset. (I basically want to get rid of signal that relates to the barcode.)What I am struggling with is how best to do this as I don't know what each of the chunkify files is.
For example, the
reference_lengths.npy
file has shape (35691,),references.npy
has shape (35691, 482), andchunks.npy
has shape (35691, 4000). How do each of these files relate to each other?Let's say I want to trim 100 signal samples from the start and end of each read, how would I do this? (I am open to suggestions for offset sizes - this was an arbitrary number).
The text was updated successfully, but these errors were encountered: