-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a Gene Shet reference data update cmd. #3576
Merged
Merged
Changes from 4 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
690e199
Create a gene Shet reference data update cmd.
ShifaSZ 207ce4c
Solve a merge conflict.
ShifaSZ 120c5da
Remove gene_constrained field and change gs folder.
ShifaSZ 6bceb0e
Update tests.
ShifaSZ b14e574
remove migration
hanars 8d2ffcd
Merge branch 'dev' of https://github.com/broadinstitute/seqr into she…
hanars 72910be
update to use data from publication
hanars d9d2ed9
update tetss
hanars 4d3788e
add shet to gene page
hanars 8a2c65e
combine lof constraint label
hanars 4a6d53a
Merge branch 'dev' of https://github.com/broadinstitute/seqr into she…
hanars File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
import logging | ||
from reference_data.management.commands.utils.update_utils import GeneCommand, ReferenceDataHandler | ||
from reference_data.models import GeneShet | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
class ShetReferenceDataHandler(ReferenceDataHandler): | ||
|
||
model_cls = GeneShet | ||
# The .tsv file is generated from the Google Doc at https://docs.google.com/spreadsheets/d/1enxGBWCAFBHdrRlqCj_ueleiDo9K9GWn/edit#gid=1146995171 | ||
# by downloading with a tsv format. | ||
url = 'https://storage.googleapis.com/seqr-reference-data/gene_constraint/shet_Zeng(2023).xlsx%20-%20All%20scores-for%20gene%20page.tsv' | ||
|
||
@staticmethod | ||
def parse_record(record): | ||
yield { | ||
'gene_id': record['ensg'], | ||
'shet': float(record['post_mean (Shet)']), | ||
} | ||
|
||
|
||
class Command(GeneCommand): | ||
reference_data_handler = ShetReferenceDataHandler |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
from reference_data.models import GeneShet | ||
from reference_data.management.tests.test_utils import ReferenceDataCommandTestCase | ||
|
||
class UpdateGeneShetTest(ReferenceDataCommandTestCase): | ||
URL = 'https://storage.googleapis.com/seqr-reference-data/gene_constraint/shet_Zeng(2023).xlsx%20-%20All%20scores-for%20gene%20page.tsv' | ||
DATA = [ | ||
'ensg hgnc post_mean (Shet)\n', | ||
'ENSG00000223972 HGNC:37225 3.01E-05\n', | ||
'ENSG00000227233 HGNC:26441 4.85E-05\n', | ||
'ENSG00000243485 HGNC:4013 5.08E-05\n', | ||
] | ||
|
||
def test_update_gene_cn_sensitivity_command(self): | ||
self._test_update_command('update_gene_shet', 'GeneShet', created_records=2) | ||
|
||
self.assertEqual(GeneShet.objects.count(), 2) | ||
record = GeneShet.objects.get(gene__gene_id='ENSG00000223972') | ||
self.assertEqual(record.shet, 3.01E-05) | ||
record = GeneShet.objects.get(gene__gene_id='ENSG00000243485') | ||
self.assertEqual(record.shet, 5.08E-05) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Generated by Django 3.2.20 on 2023-08-25 14:33 | ||
|
||
from django.db import migrations, models | ||
import django.db.models.deletion | ||
|
||
|
||
class Migration(migrations.Migration): | ||
|
||
dependencies = [ | ||
('reference_data', '0021_auto_20221031_2049'), | ||
] | ||
|
||
operations = [ | ||
migrations.CreateModel( | ||
name='GeneShet', | ||
fields=[ | ||
('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')), | ||
('shet', models.FloatField()), | ||
('gene', models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='reference_data.geneinfo')), | ||
], | ||
), | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't post_mean the name of the actual statistic, and Shet is the whole method? So shouldn't the score column be called post_mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the column name of the spreadsheet. I don't know what name is more suitable. Maybe we should ask Lynn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the column name is
post_mean (Shet)
. The name of this method is Shet. Therefore, this column is representing the post_mean score for the Shet method. Since the name of the table is Shet, we do not need to capture Shet in the name of the score in the table. Therefore, the name of the score in the table should be post_mean