GroupReadsByUmi and optical duplicates #1013

miwalter · 2024-10-14T10:03:54Z

Hi.

In a recent experiment we sequenced the same libraries on a MiSeq (random FC) and NovaSeq (patterend FC) with similar number of reads but with a 10x higher number of duplicate reads on the NovaSeq. So, I'm wondering if there is a way to deal with optical duplicates (OD) on Illumina patterned flow cells when creating the UMI groups?

If I understand the documentation correctly, all reads with the same coordinates and UMI sequence are grouped regardless if they are PCR or optical duplicates and later used to create a consensus call. In the attached example, there is a tag family with 14 read pairs. However, looking at their location of the flow cell, there are several copies that are within a pixel distance of 2500 which is considered to be ODs on a patterned FC. Some OD cluster have 3-4 copies while other members of the same UMI family have no OD. This will skew the representation of PCR/library prep errors and also the overall size of the UMI family is overestimated (accounting for OD there are only 7 unique copies of the same UMI left). Or do I need to remove optical duplicates first (e.g with picard) and then create my UMI consensus?

Thank you very much for your comments.

miwalter · 2024-10-14T10:05:12Z

Here's the same UMI family accounted for ODs:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GroupReadsByUmi and optical duplicates #1013

GroupReadsByUmi and optical duplicates #1013

miwalter commented Oct 14, 2024

miwalter commented Oct 14, 2024

GroupReadsByUmi and optical duplicates #1013

GroupReadsByUmi and optical duplicates #1013

Comments

miwalter commented Oct 14, 2024

miwalter commented Oct 14, 2024