Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a Gene Shet reference data update cmd. #3576

Merged
merged 11 commits into from
Oct 19, 2023
Merged

Conversation

ShifaSZ
Copy link
Contributor

@ShifaSZ ShifaSZ commented Aug 24, 2023

This is one of the PRs for #3539.

@ShifaSZ ShifaSZ requested a review from hanars August 24, 2023 17:52
class ShetReferenceDataHandler(ReferenceDataHandler):

model_cls = GeneShet
url = 'https://storage.googleapis.com/seqr-reference-data/Shet/Shet_Zeng_2023.tsv'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this belongs in the gene_constraint subfolder, as specified in the ticket

class ShetReferenceDataHandler(ReferenceDataHandler):

model_cls = GeneShet
url = 'https://storage.googleapis.com/seqr-reference-data/Shet/Shet_Zeng_2023.tsv'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment in this file explaining how that file was generated/ where it came from

yield {
'gene_id': record['ensg'],
'shet': float(record['post_mean_shet']),
'shet_constrained': bool(int(record['shet_constrained'])),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need this field - we should load the Shet score for any gene we have data for, and we will define a cutoff in seqr for whether or not to flag a gene based on the score cutoff, not a database field. What in the ticket do you anticipate will require this column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the current cutoff looks like 0.1. The requirement ticket doesn't tell the cutoff explicitly. The new 'LoF constr' tag will be displayed when anyone of LoF, HI, or Shet (this column is true).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cutoff is something that we use for display purposes. It should not live in the database in any way, so we should not be loading this

@ShifaSZ ShifaSZ requested a review from hanars August 25, 2023 15:12
def parse_record(record):
yield {
'gene_id': record['ensg'],
'shet': float(record['post_mean (Shet)']),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't post_mean the name of the actual statistic, and Shet is the whole method? So shouldn't the score column be called post_mean

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the column name of the spreadsheet. I don't know what name is more suitable. Maybe we should ask Lynn.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the column name is post_mean (Shet). The name of this method is Shet. Therefore, this column is representing the post_mean score for the Shet method. Since the name of the table is Shet, we do not need to capture Shet in the name of the score in the table. Therefore, the name of the score in the table should be post_mean

@hanars hanars requested review from bpblanken and hanars and removed request for hanars October 11, 2023 15:02
@hanars hanars merged commit beaff74 into dev Oct 19, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants