Create a Gene Shet reference data update cmd. #3576

ShifaSZ · 2023-08-24T03:44:38Z

This is one of the PRs for #3539.

hanars · 2023-08-24T18:31:35Z

reference_data/management/commands/update_gene_shet.py

+class ShetReferenceDataHandler(ReferenceDataHandler):
+
+    model_cls = GeneShet
+    url = 'https://storage.googleapis.com/seqr-reference-data/Shet/Shet_Zeng_2023.tsv'


this belongs in the gene_constraint subfolder, as specified in the ticket

hanars · 2023-08-24T18:32:33Z

reference_data/management/commands/update_gene_shet.py

+class ShetReferenceDataHandler(ReferenceDataHandler):
+
+    model_cls = GeneShet
+    url = 'https://storage.googleapis.com/seqr-reference-data/Shet/Shet_Zeng_2023.tsv'


add a comment in this file explaining how that file was generated/ where it came from

hanars · 2023-08-24T18:36:40Z

reference_data/management/commands/update_gene_shet.py

+        yield {
+            'gene_id': record['ensg'],
+            'shet': float(record['post_mean_shet']),
+            'shet_constrained': bool(int(record['shet_constrained'])),


I'm not sure we need this field - we should load the Shet score for any gene we have data for, and we will define a cutoff in seqr for whether or not to flag a gene based on the score cutoff, not a database field. What in the ticket do you anticipate will require this column?

Hmm, the current cutoff looks like 0.1. The requirement ticket doesn't tell the cutoff explicitly. The new 'LoF constr' tag will be displayed when anyone of LoF, HI, or Shet (this column is true).

the cutoff is something that we use for display purposes. It should not live in the database in any way, so we should not be loading this

hanars · 2023-08-25T15:44:24Z

reference_data/management/commands/update_gene_shet.py

+    def parse_record(record):
+        yield {
+            'gene_id': record['ensg'],
+            'shet': float(record['post_mean (Shet)']),


isn't post_mean the name of the actual statistic, and Shet is the whole method? So shouldn't the score column be called post_mean

It is the column name of the spreadsheet. I don't know what name is more suitable. Maybe we should ask Lynn.

the column name is post_mean (Shet). The name of this method is Shet. Therefore, this column is representing the post_mean score for the Shet method. Since the name of the table is Shet, we do not need to capture Shet in the name of the score in the table. Therefore, the name of the score in the table should be post_mean

…t-reference-data-cmd

ShifaSZ added 2 commits August 23, 2023 23:40

Create a gene Shet reference data update cmd.

690e199

Solve a merge conflict.

207ce4c

ShifaSZ requested a review from hanars August 24, 2023 17:52

hanars requested changes Aug 24, 2023

View reviewed changes

ShifaSZ added 2 commits August 25, 2023 10:56

Remove gene_constrained field and change gs folder.

120c5da

Update tests.

6bceb0e

ShifaSZ requested a review from hanars August 25, 2023 15:12

hanars requested changes Aug 25, 2023

View reviewed changes

hanars added 6 commits October 10, 2023 15:29

remove migration

b14e574

Merge branch 'dev' of https://github.com/broadinstitute/seqr into she…

8d2ffcd

…t-reference-data-cmd

update to use data from publication

72910be

update tetss

d9d2ed9

add shet to gene page

4d3788e

combine lof constraint label

8a2c65e

hanars requested review from bpblanken and hanars and removed request for hanars October 11, 2023 15:02

Merge branch 'dev' of https://github.com/broadinstitute/seqr into she…

4a6d53a

…t-reference-data-cmd

bpblanken approved these changes Oct 19, 2023

View reviewed changes

hanars approved these changes Oct 19, 2023

View reviewed changes

hanars merged commit beaff74 into dev Oct 19, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a Gene Shet reference data update cmd. #3576

Create a Gene Shet reference data update cmd. #3576

ShifaSZ commented Aug 24, 2023 •

edited

Loading

hanars Aug 24, 2023

hanars Aug 24, 2023

hanars Aug 24, 2023

ShifaSZ Aug 25, 2023

hanars Aug 25, 2023

hanars Aug 25, 2023

ShifaSZ Aug 25, 2023

hanars Aug 25, 2023

Create a Gene Shet reference data update cmd. #3576

Create a Gene Shet reference data update cmd. #3576

Conversation

ShifaSZ commented Aug 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ShifaSZ commented Aug 24, 2023 •

edited

Loading