You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a recent experiment we sequenced the same libraries on a MiSeq (random FC) and NovaSeq (patterend FC) with similar number of reads but with a 10x higher number of duplicate reads on the NovaSeq. So, I'm wondering if there is a way to deal with optical duplicates (OD) on Illumina patterned flow cells when creating the UMI groups?
If I understand the documentation correctly, all reads with the same coordinates and UMI sequence are grouped regardless if they are PCR or optical duplicates and later used to create a consensus call. In the attached example, there is a tag family with 14 read pairs. However, looking at their location of the flow cell, there are several copies that are within a pixel distance of 2500 which is considered to be ODs on a patterned FC. Some OD cluster have 3-4 copies while other members of the same UMI family have no OD. This will skew the representation of PCR/library prep errors and also the overall size of the UMI family is overestimated (accounting for OD there are only 7 unique copies of the same UMI left). Or do I need to remove optical duplicates first (e.g with picard) and then create my UMI consensus?
Thank you very much for your comments.
The text was updated successfully, but these errors were encountered:
This is a nice idea that can also be used for other counts of UMI molecules, other than Library size estimation.
Questions:
Let's assume that the counts in the histogram are produced by ignoring the optical duplicates. Is there an easy way to turn that histogram into a library size? I'm not aware of a tool that takes in a duplicate-set-size histogram and produces a library size (not that the calculation is especially complicated...)
Are there other points in consensus calling that one might want to ignore the counts of optical duplicates? For example: the template counts for filtering?
Should there be an option to mark reads as optical duplicates (like in Picard's MarkDuplicates)?
Hi.
In a recent experiment we sequenced the same libraries on a MiSeq (random FC) and NovaSeq (patterend FC) with similar number of reads but with a 10x higher number of duplicate reads on the NovaSeq. So, I'm wondering if there is a way to deal with optical duplicates (OD) on Illumina patterned flow cells when creating the UMI groups?
If I understand the documentation correctly, all reads with the same coordinates and UMI sequence are grouped regardless if they are PCR or optical duplicates and later used to create a consensus call. In the attached example, there is a tag family with 14 read pairs. However, looking at their location of the flow cell, there are several copies that are within a pixel distance of 2500 which is considered to be ODs on a patterned FC. Some OD cluster have 3-4 copies while other members of the same UMI family have no OD. This will skew the representation of PCR/library prep errors and also the overall size of the UMI family is overestimated (accounting for OD there are only 7 unique copies of the same UMI left). Or do I need to remove optical duplicates first (e.g with picard) and then create my UMI consensus?
Thank you very much for your comments.
The text was updated successfully, but these errors were encountered: