update individuals with sample qc #4663

jklugherz · 2025-02-20T00:29:58Z

much of the logic is based on this function, but I tried to make fewer db queries

hanars · 2025-02-20T18:53:24Z

seqr/management/commands/check_for_new_samples_from_pipeline.py

+            i.sample.sample_id: i for i in Individual.objects.filter(
+                family__project__in=projects,
+                sample__sample_id__in=sample_qc_map.keys(),
+            ).select_related('sample')


select_related is generally an inefficient way to query things (although there are a lot of these floating around in old code still) so its generally a sign that what we are querying from the database is not efficiently written.
Also, while its not obvious in this part of the code, for hail data the Sample sample_id is always identical to the corresponding Individual's individual_id
Also, we are currently not using it but match_and_update_search_samples returns updated_samples which will be a queryset of all the sample models that were included in this round of loading. We should be able to safely assume that the sample QC data will only be included for those samples, so this can be rewritten as follows

sample_id_individual_map = { i.individual_id: i for i in Individual.objects.filter(sample__in=updated_samples) }

I was considering using updated_samples! glad to hear I was on the right track, thanks for the suggestion.

hanars · 2025-02-20T18:54:48Z

seqr/management/commands/check_for_new_samples_from_pipeline.py

+        unknown_pop_filter_flags = set()
+
+        for sample_id, individual in sample_individual_map.items():
+            record = sample_qc_map[sample_id]


I think this would be more stable if you iterate through sample_qc_map and get the corresponding individual model from sample_id_individual_map, rather than doing it in this order

hanars · 2025-02-20T18:55:24Z

seqr/management/commands/check_for_new_samples_from_pipeline.py

+        for sample_id, individual in sample_individual_map.items():
+            record = sample_qc_map[sample_id]
+            filter_flags = {}
+            for flag in json.loads(record['filter_flags']):


I'm assuming this is essentially copy-paste from existing qc code- let me know if you made any substantive changes?

no substantive changes here !

hanars · 2025-02-20T19:22:47Z

seqr/management/commands/check_for_new_samples_from_pipeline.py

+            updated_individuals.append(individual)
+
+        if updated_individuals:
+            Individual.objects.bulk_update(


Its better to use the Individual.bulk_update_models helper as it includes audit logging by default

hanars · 2025-02-20T19:24:33Z

seqr/management/commands/check_for_new_samples_from_pipeline.py

+        if unknown_filter_flags:
+            message = 'The following filter flags have no known corresponding value and were not saved: {}'.format(
+                ', '.join(unknown_filter_flags))
+            logger.warning(message)


warnings are useful when something is triggered in the UI and the person who does so will be shown the warnings, but they are essentially useless in a job that runs automatically in the background.
I think we should hard fail updating QC if there are unexpected values, as we now have more control over how these are generated and can choose ot to include things we don't care about in our metadata

I agree here, this was a copy paste without critical thinking. There's actually no way we'd have erroneous filter flags, and for the hail qc_metrics_filters, which are set by hail's sample qc, we can trust that the output has the same flags every time, so I'm leaning towards just taking the flag validation out of the new seqr sample qc flow.

hanars · 2025-02-24T16:00:00Z

seqr/management/commands/check_for_new_samples_from_pipeline.py

@@ -242,6 +249,10 @@ def _load_new_samples(cls, metadata_path, genome_version, dataset_type, run_vers
        except Exception as e:
            logger.error(f'Error reporting loading failure for {run_version}: {e}')

+        # Update sample qc
+        if 'sample_qc' in metadata:
+            cls.update_individuals_sample_qc(sample_type, updated_samples, metadata['sample_qc'])


we should wrap this in a try/except block that logs an error because if something goes wrong somehow we would still want the rest of the job to proceed

…mand

first pass update individuals with sample qc

8a567f8

jklugherz requested a review from hanars February 20, 2025 18:12

hanars reviewed Feb 20, 2025

View reviewed changes

jklugherz added 2 commits February 20, 2025 17:13

comments, testing, a few changes

774d2cb

update test and error checking

93cbbb7

jklugherz changed the title ~~first pass update individuals with sample qc~~ update individuals with sample qc Feb 21, 2025

jklugherz added 2 commits February 21, 2025 14:35

order by in test

eaf7232

no print statements in production

ef7bc96

jklugherz marked this pull request as ready for review February 21, 2025 19:37

jklugherz requested a review from hanars February 21, 2025 19:37

hanars reviewed Feb 24, 2025

View reviewed changes

jklugherz added 4 commits February 24, 2025 11:08

try/except

a9fd30f

underscore

87534a5

pass in individuals instead of updated_samples, better for manage com…

1830984

…mand

merge dev

3aad126

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update individuals with sample qc #4663

update individuals with sample qc #4663

jklugherz commented Feb 20, 2025 •

edited

Loading

hanars Feb 20, 2025

jklugherz Feb 20, 2025

hanars Feb 20, 2025

hanars Feb 20, 2025

jklugherz Feb 20, 2025

hanars Feb 20, 2025

hanars Feb 20, 2025

jklugherz Feb 21, 2025

hanars Feb 24, 2025

update individuals with sample qc #4663

Are you sure you want to change the base?

update individuals with sample qc #4663

Conversation

jklugherz commented Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jklugherz commented Feb 20, 2025 •

edited

Loading