Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a Gene Shet reference data update cmd. #3576

Merged
merged 11 commits into from
Oct 19, 2023
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# _seqr_ Changes

## dev
* Add GeneShet model to the reference DB (REQUIRES DB MIGRATION)

## 8/22/23
* Add db indices to optimize RNA data queries (REQUIRES DB MIGRATION)
Expand Down
23 changes: 23 additions & 0 deletions reference_data/management/commands/update_gene_shet.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import logging
from reference_data.management.commands.utils.update_utils import GeneCommand, ReferenceDataHandler
from reference_data.models import GeneShet

logger = logging.getLogger(__name__)


class ShetReferenceDataHandler(ReferenceDataHandler):

model_cls = GeneShet
url = 'https://storage.googleapis.com/seqr-reference-data/Shet/Shet_Zeng_2023.tsv'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this belongs in the gene_constraint subfolder, as specified in the ticket

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment in this file explaining how that file was generated/ where it came from


@staticmethod
def parse_record(record):
yield {
'gene_id': record['ensg'],
'shet': float(record['post_mean_shet']),
'shet_constrained': bool(int(record['shet_constrained'])),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need this field - we should load the Shet score for any gene we have data for, and we will define a cutoff in seqr for whether or not to flag a gene based on the score cutoff, not a database field. What in the ticket do you anticipate will require this column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the current cutoff looks like 0.1. The requirement ticket doesn't tell the cutoff explicitly. The new 'LoF constr' tag will be displayed when anyone of LoF, HI, or Shet (this column is true).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cutoff is something that we use for display purposes. It should not live in the database in any way, so we should not be loading this

}


class Command(GeneCommand):
reference_data_handler = ShetReferenceDataHandler
22 changes: 22 additions & 0 deletions reference_data/management/tests/update_gene_shet_tests.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
from reference_data.models import GeneShet
from reference_data.management.tests.test_utils import ReferenceDataCommandTestCase

class UpdateGeneShetTest(ReferenceDataCommandTestCase):
URL = 'https://storage.googleapis.com/seqr-reference-data/Shet/Shet_Zeng_2023.tsv'
DATA = [
'ensg hgnc post_mean_shet shet_constrained\n',
'ENSG00000223972 HGNC:37225 3.01E-05 0\n',
'ENSG00000227233 HGNC:26441 4.85E-05 0\n',
'ENSG00000243485 HGNC:4013 5.08E-05 1\n',
]

def test_update_gene_cn_sensitivity_command(self):
self._test_update_command('update_gene_shet', 'GeneShet', created_records=2)

self.assertEqual(GeneShet.objects.count(), 2)
record = GeneShet.objects.get(gene__gene_id='ENSG00000223972')
self.assertEqual(record.shet, 3.01E-05)
self.assertEqual(record.shet_constrained, False)
record = GeneShet.objects.get(gene__gene_id='ENSG00000243485')
self.assertEqual(record.shet, 5.08E-05)
self.assertEqual(record.shet_constrained, True)
23 changes: 23 additions & 0 deletions reference_data/migrations/0022_geneshet.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Generated by Django 3.2.20 on 2023-08-22 20:45

from django.db import migrations, models
import django.db.models.deletion


class Migration(migrations.Migration):

dependencies = [
('reference_data', '0021_auto_20221031_2049'),
]

operations = [
migrations.CreateModel(
name='GeneShet',
fields=[
('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
('shet', models.FloatField()),
('shet_constrained', models.BooleanField()),
('gene', models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='reference_data.geneinfo')),
],
),
]
10 changes: 10 additions & 0 deletions reference_data/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,16 @@ class Meta:
json_fields = ['pHI', 'pTS']


class GeneShet(models.Model):
gene = models.ForeignKey(GeneInfo, on_delete=models.CASCADE)

shet = models.FloatField()
shet_constrained = models.BooleanField()

class Meta:
json_fields = ['shet', 'shet_constrained']


class Omim(models.Model):
MAP_METHOD_CHOICES = (
('1', 'the disorder is placed on the map based on its association with a gene, but the underlying defect is not known.'),
Expand Down
9 changes: 9 additions & 0 deletions seqr/fixtures/reference_data.json
Original file line number Diff line number Diff line change
Expand Up @@ -1167,6 +1167,15 @@
"pHI": 0.90576,
"pTS": 0.7346
}
},
{
"model": "reference_data.geneshet",
"pk": 1,
"fields": {
"gene": 1,
"shet": 0.90576,
"shet_constrained": true
}
},
{
"model": "reference_data.dbnsfpgene",
Expand Down
Loading