Hail backend - SV WGS #3574

hanars · 2023-08-22T19:02:04Z

Getting this up early for review, still needs unit tests

…l-backend-gcnv

bpblanken · 2023-08-23T18:04:46Z

hail_search/hail_search_query.py

+            'response_key': 'transcripts',
+            'empty_array': True,
+            'format_value': lambda value: value.rename({k: _to_camel_case(k) for k in value.keys()}),
+            'format_values': lambda values: values.group_by(lambda t: t.geneId),


Might want to name this function something other than format_values. Having format_value and format_values both be present but have different behavior is a bit surprising.

bpblanken · 2023-08-23T18:10:25Z

hail_search/hail_search_query.py

+        'protein_consequence': lambda r: [hl.min(r.sorted_gene_consequences.map(lambda g: g.major_consequence_id))],
+        'size': lambda r: [hl.if_else(
+            r.start_locus.contig == r.end_locus.contig, r.start_locus.position - r.end_locus.position, -50,
+        )],


Maybe:

SORTS = { 'protein_consequence': ... ... **BaseHailTableQuery.SORTS }

bpblanken · 2023-08-23T18:16:03Z

hail_search/hail_search_query.py

+        return [gene_ranks.get(r.selected_transcript.gene_id)] + super()._gene_rank_sort(r, gene_ranks)
+
+
+class SvHailTableQuery(BaseHailTableQuery):


could these subclasses be in separate files?

They could be. I'll make a follow up PR to move everything, to keep the diff more manageable on this and the gcnv PR

bpblanken · 2023-08-23T18:43:31Z

hail_search/hail_search_query.py

+                (ht.alleles == [ref, alt])
+                for chrom, pos, ref, alt in variant_ids
+            ]
+            variant_id_q = variant_id_qs[0]


this looks like another hl.any 😜

bpblanken · 2023-08-23T18:53:34Z

hail_search/hail_search_query.py

+                },
+            ),
+            # For insertions, end_locus represents the svSourceDetail, otherwise represents the endChrom
+            'endChrom': lambda r: hl.or_missing(r.sv_type_id != insertion_type_id, get_end_chrom(r)),


I had pretty good luck with just defining a hail expression as a variable and re-using:

end_chrom = get_end_chrom(r) ... 'endChrom': lambda r: hl.or_missing(r.sv_type_id != insertion_type_id, end_chrom), 'svSourceDetail': lambda r: hl.or_missing(r.sv_type_id == insertion_type_id, hl.or_missing(hl.is_defined(end_chrom), hl.struct(chrom=end_chrom)))

the problem is r is not defined above where you do end_chrom = get_end_chrom(r), its only available in the lambda

ShifaSZ · 2023-08-23T22:05:33Z

hail_search/hail_search_query.py

+    NESTED_GENOTYPE_FIELDS = {'concordance': ['new_call', 'prev_call', 'prev_num_alt']}
+    GENOTYPE_QUERY_FIELDS = {'gq_sv': 'GQ', 'gq': None}
+
+    TRANSCRIPTS_FIELD = 'sorted_gene_consequences'


We used to use sorted_transcript_consequences for both SNPs and SVs. I'm unsure why we use gene instead of transcript for SVs.

the v3 pipeline has a different data model

ShifaSZ · 2023-08-24T01:47:47Z

hail_search/hail_search_query.py

+        elif not self._load_table_kwargs.get('_intervals'):
            ht = self._prefilter_entries_table(ht, **kwargs)


The SV doesn't implement interval filtering. Why?

the SNP entry tables are keyed by locus so we can filter them by interval at the time we read them in. The SV tables are keyed by variant ID and the entry tables do not have a locus so there is no way to filter there. Therefore, for SVs the interval filtering happens after joining with the annotation table: https://github.com/broadinstitute/seqr/pull/3574/files#diff-43e8b0dcfc307385b78fcb0683e55b20ddfb4c0f85c1794aa39b33e0a23322e8R1093

ShifaSZ · 2023-08-24T02:15:49Z

hail_search/hail_search_query.py

+    POPULATIONS = {
+        'sv_callset': {'hemi': None, 'sort': 'callset_af'},
+        'gnomad_svs': {'id': 'ID', 'ac': None, 'an': None, 'hom': None, 'hemi': None, 'het': None, 'sort': 'gnomad'},
+    }
+    POPULATION_FIELDS = {'sv_callset': 'gt_stats'}
+    PREDICTION_FIELDS_CONFIG = {
+        'strvctvre': PredictionPath('strvctvre', 'score'),


I didn't recognize that the SVs have so few populations and predictions.

ShifaSZ · 2023-08-24T02:31:09Z

hail_search/hail_search_query.py

+    }
+
+    SORTS = {
+        'protein_consequence': lambda r: [hl.min(r.sorted_gene_consequences.map(lambda g: g.major_consequence_id))],


Use r[TRANSCRIPTS_FIELD] for r.sorted_gene_consequences?

its not in scope here so it would have to be r[SvHailTableQuery. TRANSCRIPTS_FIELD] which is less readable. The purpose of that varianble is to support methods in the base class, not so much to use it every possible location

ShifaSZ · 2023-08-24T02:31:25Z

hail_search/hail_search_query.py

+        return super()._get_allowed_consequences_annotations(annotations, annotation_filters)
+
+    def _get_consequence_filter(self, allowed_consequence_ids, annotation_exprs):
+        return self._ht.sorted_gene_consequences.any(


Same as above

…l-backend-sv-wgs

hanars added 21 commits August 16, 2023 18:47

Merge branch 'hail-backend-cache-globals' of https://github.com/broad…

786156f

…institute/seqr into hail-backend-sv

format response fields for gcnv

2672618

clean up merged logic

7ecc2a6

shared variant behavior cleanup

8018d9a

clean up

a5b4a01

gcnv annotation search

b5eab81

clean up

5c935fb

handle sv type formatting

13f6510

Merge branch 'dev' of https://github.com/broadinstitute/seqr into hai…

f87ad45

…l-backend-gcnv

clean up

1786b23

nested concordance

9a1c169

format enm for transcripts

fdcd187

clean up annotation formatting

6bf948c

derived enum fields

0e7200b

cpxIntervals placeholder

d4a10b4

search cleanup

81969f5

updae sv wgs new data format

9a73335

share sv concordance

5c8526a

clean up

b0b1ef8

gcnv cleanup

c55f15e

clean up

68297fa

hanars requested review from bpblanken and ShifaSZ August 22, 2023 19:02

hanars added 2 commits August 22, 2023 16:17

codacy fix

e9ce30c

add sv wgs test fixtures

aa97d36

bpblanken reviewed Aug 23, 2023

View reviewed changes

hanars added 3 commits August 23, 2023 15:39

add fixture data

9ebcd65

pr feedback

90815d1

fix 37 locus end and x linked search

04eb049

ShifaSZ suggested changes Aug 24, 2023

View reviewed changes

hanars added 5 commits August 24, 2023 11:32

fix sv secondary annotation filtering

da8238f

pr feedback

e247080

sort tests

e4a56c6

test new call filter

e4b6953

Merge branch 'dev' of https://github.com/broadinstitute/seqr into hai…

4cdc65b

…l-backend-sv-wgs

hanars requested review from ShifaSZ and bpblanken August 24, 2023 17:00

bpblanken approved these changes Sep 5, 2023

View reviewed changes

hanars merged commit cef2cbf into dev Sep 5, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hail backend - SV WGS #3574

Hail backend - SV WGS #3574

hanars commented Aug 22, 2023

bpblanken Aug 23, 2023

bpblanken Aug 23, 2023

bpblanken Aug 23, 2023

hanars Aug 23, 2023

bpblanken Aug 23, 2023

hanars Aug 23, 2023

bpblanken Aug 23, 2023

hanars Aug 23, 2023

ShifaSZ Aug 23, 2023

hanars Aug 24, 2023

ShifaSZ Aug 24, 2023

hanars Aug 24, 2023

ShifaSZ Aug 24, 2023

ShifaSZ Aug 24, 2023

hanars Aug 24, 2023 •

edited

Loading

ShifaSZ Aug 24, 2023

		return [gene_ranks.get(r.selected_transcript.gene_id)] + super()._gene_rank_sort(r, gene_ranks)


		class SvHailTableQuery(BaseHailTableQuery):

		elif not self._load_table_kwargs.get('_intervals'):
		ht = self._prefilter_entries_table(ht, **kwargs)

Hail backend - SV WGS #3574

Hail backend - SV WGS #3574

Conversation

hanars commented Aug 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hanars Aug 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hanars Aug 24, 2023 •

edited

Loading