-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a Gene Shet reference data update cmd. #3576
Changes from 2 commits
690e199
207ce4c
120c5da
6bceb0e
b14e574
8d2ffcd
72910be
d9d2ed9
4d3788e
8a2c65e
4a6d53a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
import logging | ||
from reference_data.management.commands.utils.update_utils import GeneCommand, ReferenceDataHandler | ||
from reference_data.models import GeneShet | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
class ShetReferenceDataHandler(ReferenceDataHandler): | ||
|
||
model_cls = GeneShet | ||
url = 'https://storage.googleapis.com/seqr-reference-data/Shet/Shet_Zeng_2023.tsv' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a comment in this file explaining how that file was generated/ where it came from |
||
|
||
@staticmethod | ||
def parse_record(record): | ||
yield { | ||
'gene_id': record['ensg'], | ||
'shet': float(record['post_mean_shet']), | ||
'shet_constrained': bool(int(record['shet_constrained'])), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure we need this field - we should load the Shet score for any gene we have data for, and we will define a cutoff in seqr for whether or not to flag a gene based on the score cutoff, not a database field. What in the ticket do you anticipate will require this column? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, the current cutoff looks like 0.1. The requirement ticket doesn't tell the cutoff explicitly. The new There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the cutoff is something that we use for display purposes. It should not live in the database in any way, so we should not be loading this |
||
} | ||
|
||
|
||
class Command(GeneCommand): | ||
reference_data_handler = ShetReferenceDataHandler |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
from reference_data.models import GeneShet | ||
from reference_data.management.tests.test_utils import ReferenceDataCommandTestCase | ||
|
||
class UpdateGeneShetTest(ReferenceDataCommandTestCase): | ||
URL = 'https://storage.googleapis.com/seqr-reference-data/Shet/Shet_Zeng_2023.tsv' | ||
DATA = [ | ||
'ensg hgnc post_mean_shet shet_constrained\n', | ||
'ENSG00000223972 HGNC:37225 3.01E-05 0\n', | ||
'ENSG00000227233 HGNC:26441 4.85E-05 0\n', | ||
'ENSG00000243485 HGNC:4013 5.08E-05 1\n', | ||
] | ||
|
||
def test_update_gene_cn_sensitivity_command(self): | ||
self._test_update_command('update_gene_shet', 'GeneShet', created_records=2) | ||
|
||
self.assertEqual(GeneShet.objects.count(), 2) | ||
record = GeneShet.objects.get(gene__gene_id='ENSG00000223972') | ||
self.assertEqual(record.shet, 3.01E-05) | ||
self.assertEqual(record.shet_constrained, False) | ||
record = GeneShet.objects.get(gene__gene_id='ENSG00000243485') | ||
self.assertEqual(record.shet, 5.08E-05) | ||
self.assertEqual(record.shet_constrained, True) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Generated by Django 3.2.20 on 2023-08-22 20:45 | ||
|
||
from django.db import migrations, models | ||
import django.db.models.deletion | ||
|
||
|
||
class Migration(migrations.Migration): | ||
|
||
dependencies = [ | ||
('reference_data', '0021_auto_20221031_2049'), | ||
] | ||
|
||
operations = [ | ||
migrations.CreateModel( | ||
name='GeneShet', | ||
fields=[ | ||
('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')), | ||
('shet', models.FloatField()), | ||
('shet_constrained', models.BooleanField()), | ||
('gene', models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='reference_data.geneinfo')), | ||
], | ||
), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this belongs in the gene_constraint subfolder, as specified in the ticket