ONT filtering strategies #3738

karynne7 · 2023-11-17T23:05:02Z

karynne7
Nov 17, 2023

The second idea we are trying to work on at UW is that we're hoping to add 2 features for better filtering of ONT data. Ideally we will bring about some PacBio or other technologies, but most immediately for us ONT filtering is a priority.

Under the frequency section, we propose to add two datasets - one 1000G ONT dataset we're still putting together and a second HPRC/HGVSC dataset. I see that you don't add SV frequency datasets as an option unless SV data is loaded for that specific project. Should we implement some trigger for long read data, or would you be ok with making these additions permanent and then our default variant filtering can pick between datasets to include? (Additionally, does the function work if the dataset has SVs and short variants -- or would you suggest they be separate filtering sets?)

The second feature for ONT is filtering out a set of regions that deem problematic in ONT long reads and help us get a more reasonable set of variants to search through. Currently, we are removing these variants before loading into Seqr, but ideally we'd like to still have them uploaded when we want to look deeper into certain regions or have no good candidates on first pass. We're proposing adding this as an entire separate section underneath Call Quality and titled "Problematic Regions", with 4 check boxes for regions in 4 corresponding bed files to exclude. The 4 files include dinucleotide, homopolymer, segmental duplications and a low complexity region comprised of centromeric and acrocentric p-arms. Users could select to exclude only those regions they wish, or all 4, or none. This requirement would supersede all other annotations marked, similar to the regional searches. Again, would this show up at all times, or only for ONT/Long Read datasets somehow? It may be also useful in short read sequencing, or even with SVs, to have access to these as options.

I know you have started looking into implementing long reads in Seqr more officially, and we don't want to conflict or double the work if you have already started this implementation. So please let us know we're you're currently at if you have and how we can help. And if not, your feedback on design and feasibility is always appreciated, thank you!

hanars · 2023-11-20T15:10:51Z

hanars
Nov 20, 2023
Maintainer

So just to let you know, the main seqr branch will not be ready to accept any ONT-specific support in the near future. As you mentioned, we are also starting an initial pilot for looking at this data, but I am hesitant to bring in a bunch of code that we will have to maintain in perpetuity while we are still this early in understanding how we want this data to work in seqr. This is not to discourage you from making these changes to your own instance of seqr, and we will certainly want to here how this works for you and what you do and do not end up finding valuable, but just wanted to make sure we are all on the same page about who the target audience of these features is. That said, we are really not adding any special feature support for these calls right now, so any features you build will not be conflicting with ours.

From a design perspective, call quality should really be filtering on properties of a specific call, not a specific variant. So a specific chom/pos/ref/alt should always have different quality metrics for different samples, and any metrics that are always the same in all samples for a specific chrom/pos/ref/alt really belong as annotation filters or in silico filters

4 replies

jxchong Nov 20, 2023

The four regions/interval files are actually not ONT-specific. They are flagging different genomic regions that tend to be problematic for all sequencing technologies and callers. It's just that ONT is particularly noisy so needs these flags the most, but it would still be useful to see these regions flagged for other techs too because it helps highlight calls that are more likely to be artifacts (or that your gene of interest may be in a region of the genome where you might be missing additional calls entirely). That's why we were thinking about putting these in "call quality" as opposed to an ONT-specific panel.

hanars Nov 20, 2023
Maintainer

So whether they are ONT specific or not determines whether we want to add the annotations to all data types in the pipeline and make the filtering available for all data types in the UI, or if we don't. That seems like something where it would be great if your team tried them for all data types and determined if they were useful or not. However, whether its ONT specific or not has nothing to do with where in the UI that filtering belongs. As I said above, call quality is supposed to be filtering on the quality of a call, things like quality metrics that are call specific. Whether a variant is or is not in a genomic region says nothing about the quality of a specific call relative to the quality of that call for other samples at the same locus.

Also, if these are really just genomic regions is there a reason you wouldn't just use the existing location filtering and and the "exclude locations" checkbox?

jxchong Nov 20, 2023

The reason would be because these files contains tens of thousands to millions of intervals. Our understanding is that it would have to be implemented like SCREEN. Would you suggest a separate/new expandable panel of "problematic regions" or similar?

hanars Nov 20, 2023
Maintainer

I would not make a whole new top level section for this, that is a lot of real estate for a set of filters that for most analysts for most searches they are not going to be thinking about. The sections are supposed to be thematic, and since this is filtering on genomic location I would probably add these filters to the location panel as a list of checkboxes title "Exclude regions" or something like that

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONT filtering strategies #3738

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

ONT filtering strategies #3738

karynne7 Nov 17, 2023

Replies: 1 comment · 4 replies

hanars Nov 20, 2023 Maintainer

jxchong Nov 20, 2023

hanars Nov 20, 2023 Maintainer

jxchong Nov 20, 2023

hanars Nov 20, 2023 Maintainer

karynne7
Nov 17, 2023

Replies: 1 comment 4 replies

hanars
Nov 20, 2023
Maintainer

hanars Nov 20, 2023
Maintainer

hanars Nov 20, 2023
Maintainer