Replies: 1 comment 4 replies
-
So just to let you know, the main seqr branch will not be ready to accept any ONT-specific support in the near future. As you mentioned, we are also starting an initial pilot for looking at this data, but I am hesitant to bring in a bunch of code that we will have to maintain in perpetuity while we are still this early in understanding how we want this data to work in seqr. This is not to discourage you from making these changes to your own instance of seqr, and we will certainly want to here how this works for you and what you do and do not end up finding valuable, but just wanted to make sure we are all on the same page about who the target audience of these features is. That said, we are really not adding any special feature support for these calls right now, so any features you build will not be conflicting with ours. From a design perspective, call quality should really be filtering on properties of a specific call, not a specific variant. So a specific chom/pos/ref/alt should always have different quality metrics for different samples, and any metrics that are always the same in all samples for a specific chrom/pos/ref/alt really belong as annotation filters or in silico filters |
Beta Was this translation helpful? Give feedback.
-
The second idea we are trying to work on at UW is that we're hoping to add 2 features for better filtering of ONT data. Ideally we will bring about some PacBio or other technologies, but most immediately for us ONT filtering is a priority.
Under the frequency section, we propose to add two datasets - one 1000G ONT dataset we're still putting together and a second HPRC/HGVSC dataset. I see that you don't add SV frequency datasets as an option unless SV data is loaded for that specific project. Should we implement some trigger for long read data, or would you be ok with making these additions permanent and then our default variant filtering can pick between datasets to include? (Additionally, does the function work if the dataset has SVs and short variants -- or would you suggest they be separate filtering sets?)
The second feature for ONT is filtering out a set of regions that deem problematic in ONT long reads and help us get a more reasonable set of variants to search through. Currently, we are removing these variants before loading into Seqr, but ideally we'd like to still have them uploaded when we want to look deeper into certain regions or have no good candidates on first pass. We're proposing adding this as an entire separate section underneath Call Quality and titled "Problematic Regions", with 4 check boxes for regions in 4 corresponding bed files to exclude. The 4 files include dinucleotide, homopolymer, segmental duplications and a low complexity region comprised of centromeric and acrocentric p-arms. Users could select to exclude only those regions they wish, or all 4, or none. This requirement would supersede all other annotations marked, similar to the regional searches. Again, would this show up at all times, or only for ONT/Long Read datasets somehow? It may be also useful in short read sequencing, or even with SVs, to have access to these as options.
I know you have started looking into implementing long reads in Seqr more officially, and we don't want to conflict or double the work if you have already started this implementation. So please let us know we're you're currently at if you have and how we can help. And if not, your feedback on design and feasibility is always appreciated, thank you!
Beta Was this translation helpful? Give feedback.
All reactions