GroupReadsByUmi and optical duplicates #1013

miwalter · 2024-10-14T10:03:54Z

Hi.

In a recent experiment we sequenced the same libraries on a MiSeq (random FC) and NovaSeq (patterend FC) with similar number of reads but with a 10x higher number of duplicate reads on the NovaSeq. So, I'm wondering if there is a way to deal with optical duplicates (OD) on Illumina patterned flow cells when creating the UMI groups?

If I understand the documentation correctly, all reads with the same coordinates and UMI sequence are grouped regardless if they are PCR or optical duplicates and later used to create a consensus call. In the attached example, there is a tag family with 14 read pairs. However, looking at their location of the flow cell, there are several copies that are within a pixel distance of 2500 which is considered to be ODs on a patterned FC. Some OD cluster have 3-4 copies while other members of the same UMI family have no OD. This will skew the representation of PCR/library prep errors and also the overall size of the UMI family is overestimated (accounting for OD there are only 7 unique copies of the same UMI left). Or do I need to remove optical duplicates first (e.g with picard) and then create my UMI consensus?

Thank you very much for your comments.

miwalter · 2024-10-14T10:05:12Z

Here's the same UMI family accounted for ODs:

yfarjoun · 2025-02-16T16:23:46Z

This is a nice idea that can also be used for other counts of UMI molecules, other than Library size estimation.

Questions:

Let's assume that the counts in the histogram are produced by ignoring the optical duplicates. Is there an easy way to turn that histogram into a library size? I'm not aware of a tool that takes in a duplicate-set-size histogram and produces a library size (not that the calculation is especially complicated...)
Are there other points in consensus calling that one might want to ignore the counts of optical duplicates? For example: the template counts for filtering?
Should there be an option to mark reads as optical duplicates (like in Picard's MarkDuplicates)?

cehtolonen added the question label Jan 31, 2025

yfarjoun linked a pull request Feb 16, 2025 that will close this issue

feat: add optical duplicate tagging to GroupReadsByUmi #1029

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GroupReadsByUmi and optical duplicates #1013

GroupReadsByUmi and optical duplicates #1013

miwalter commented Oct 14, 2024

miwalter commented Oct 14, 2024

yfarjoun commented Feb 16, 2025

GroupReadsByUmi and optical duplicates #1013

GroupReadsByUmi and optical duplicates #1013

Comments

miwalter commented Oct 14, 2024

miwalter commented Oct 14, 2024

yfarjoun commented Feb 16, 2025