Trimming CTC data #253

mbhall88 · 2022-05-07T08:57:23Z

Hi,

I am having some issues relating to demultiplexing. I have trained a custom model, it performs extremely well on the DNA of my species of interest, but falls over when it comes to demultiplexing. I am losing more than half of my data to the dreaded "none" bin.

In #26 (comment) it was suggested that trimming the signal could improve this - and this makes a lot of sense. However, the example there assumes trimming in the process of chunkifying a HDF5 file from taiyaki.

I have the chunk data already (from basecalling with --save-ctc) and would like to trim this to achieve the same result as trimming the signal at the starts and the ends by some offset. (I basically want to get rid of signal that relates to the barcode.)

What I am struggling with is how best to do this as I don't know what each of the chunkify files is.

For example, the reference_lengths.npy file has shape (35691,), references.npy has shape (35691, 482), and chunks.npy has shape (35691, 4000). How do each of these files relate to each other?

Let's say I want to trim 100 signal samples from the start and end of each read, how would I do this? (I am open to suggestions for offset sizes - this was an arbitrary number).

The text was updated successfully, but these errors were encountered:

touala · 2023-11-29T11:30:32Z

Any development on this issue? Thanks in advance.

mbhall88 · 2023-11-29T22:00:08Z

Sadly, no @touala. I never got any response about this issue. I ended up having to abandon my project because of this issue. I tried many different ways to trim the data but couldn't fix this demultiplexing issue unfortunately.

touala · 2024-01-08T06:56:42Z

Thanks for the response @mbhall88. I'm currently doing the demultiplexing with ONT model and then using my custom model to redo the basecalling... Not great but it seems ok. I'll revisit soon as I need to update all my workflow. Hopefully this got better since last time I tried.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trimming CTC data #253

Trimming CTC data #253

mbhall88 commented May 7, 2022 •

edited

Loading

touala commented Nov 29, 2023

mbhall88 commented Nov 29, 2023

touala commented Jan 8, 2024 •

edited

Loading

Trimming CTC data #253

Trimming CTC data #253

Comments

mbhall88 commented May 7, 2022 • edited Loading

touala commented Nov 29, 2023

mbhall88 commented Nov 29, 2023

touala commented Jan 8, 2024 • edited Loading

mbhall88 commented May 7, 2022 •

edited

Loading

touala commented Jan 8, 2024 •

edited

Loading