Replies: 2 comments
-
I remember seeing a "parse" function somewhere too, but I couldn't find that. I did find an example how this type of data is parsed in the seqr code: seqr/seqr/utils/search/elasticsearch/constants.py Lines 341 to 345 in 9b2815f |
Beta Was this translation helpful? Give feedback.
-
So dbnsfp comes into seqr in 2 places, one to annotate genes and one to annotate variants. The variant-level annotations are added in the loading pipeline when variants are joined with the reference data table. Generally speaking, the code for generating and updating the reference data is not something that is well structured to have other groups run, and we have no documentation for it and its a little finicky. We make the reference data table freely available for download which is how we recommend users interact with this reference data. However, if you are curious about how the parsing is done for dbnsfp when creating that table, the relevant code is here: We have also recently update seqr to support a new v3 loading pipeliene. While this is not yet ready to be used buy other groups, you are welcome to take a look at the code to see how we plan to parse dbnsfp going forward with the new pipeline: |
Beta Was this translation helpful? Give feedback.
-
Hi all.
Our team has updated to dbnsfpv4.5a, and we have some questions about how the dbnsfp fields are parsed in seqr and the seqr-pipeline.
There are multiple values for sites with multiple transcripts for fields like VEST4, REVEL, and AlphaMissense (new to v4.5a).
From what I can tell these values are currently parsed out on the frontend by picking one of the(or the first?) non-missing
.
values. However, there's not always just one non-missing value. Should we instead use one of the transcript quality tags to pick the right scores (looking at you canonical 😄 )?From the dbNSFP readme:
If so it would seem like the best thing to do would be to create a custom "select" function for these fields delimited by
;
in order to parse the VEP_canonical score for REVEL, VEST4, etc. similar to the way it's done for gnomad:https://github.com/broadinstitute/seqr-loading-pipelines/blob/54ed7bb07c719cdc831d1277afbfa595d01b3fc6/v03_pipeline/lib/reference_data/config.py#L96-L122
Beta Was this translation helpful? Give feedback.
All reactions