-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a Gene Shet reference data update cmd. #3576
Conversation
class ShetReferenceDataHandler(ReferenceDataHandler): | ||
|
||
model_cls = GeneShet | ||
url = 'https://storage.googleapis.com/seqr-reference-data/Shet/Shet_Zeng_2023.tsv' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this belongs in the gene_constraint subfolder, as specified in the ticket
class ShetReferenceDataHandler(ReferenceDataHandler): | ||
|
||
model_cls = GeneShet | ||
url = 'https://storage.googleapis.com/seqr-reference-data/Shet/Shet_Zeng_2023.tsv' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment in this file explaining how that file was generated/ where it came from
yield { | ||
'gene_id': record['ensg'], | ||
'shet': float(record['post_mean_shet']), | ||
'shet_constrained': bool(int(record['shet_constrained'])), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need this field - we should load the Shet score for any gene we have data for, and we will define a cutoff in seqr for whether or not to flag a gene based on the score cutoff, not a database field. What in the ticket do you anticipate will require this column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the current cutoff looks like 0.1. The requirement ticket doesn't tell the cutoff explicitly. The new 'LoF constr' tag
will be displayed when anyone of LoF, HI, or Shet (this column is true).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cutoff is something that we use for display purposes. It should not live in the database in any way, so we should not be loading this
def parse_record(record): | ||
yield { | ||
'gene_id': record['ensg'], | ||
'shet': float(record['post_mean (Shet)']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't post_mean the name of the actual statistic, and Shet is the whole method? So shouldn't the score column be called post_mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the column name of the spreadsheet. I don't know what name is more suitable. Maybe we should ask Lynn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the column name is post_mean (Shet)
. The name of this method is Shet. Therefore, this column is representing the post_mean score for the Shet method. Since the name of the table is Shet, we do not need to capture Shet in the name of the score in the table. Therefore, the name of the score in the table should be post_mean
…t-reference-data-cmd
This is one of the PRs for #3539.