-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Back end support for pool metrics #224
Merged
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
59c2502
Put in some syntactic sugar checksum types to enable differentiation …
nerdstrike 486faca
Undo type fix, it seems to impact how the code works. Disturbing.
nerdstrike 2b9dc32
Define a response model for pool metrics for a given well
nerdstrike 3b3c32a
Allow WellWh helper to compute pool metrics
nerdstrike b092e19
fixture parametrisation not quite right
nerdstrike 4382aa7
An (untested) endpoint for fetching pool stats
nerdstrike 7f329ff
Make pool fixture self-cleaning
nerdstrike 3c9b9bb
Add metrics from mlwh to a multi-sample well, and test pool API endpoint
nerdstrike 7a55cc6
parameterised fixture triggers unique condition in DB, so make more d…
nerdstrike 686481b
Stop fixture polluting other tests in module
nerdstrike 2b6ae77
Data not needed for defunct mlwh column
nerdstrike 3df42fa
Update mlwh model to include new barcode4deplexing column
nerdstrike 11be2e0
Supplement fixture with barcode IDs
nerdstrike 6e5472a
Add deplexing barcodes and modes to test data. Check deplexing mode t…
nerdstrike 2f9be8a
Added a check for unlinked data.
mgcam 372fc56
Simplified getting linked lims data.
mgcam 579138b
Merge pull request #5 from mgcam/render_product_metrics
nerdstrike d8b755d
Merge branch 'devel' into render_product_metrics
nerdstrike File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# Copyright (c) 2022, 2023 Genome Research Ltd. | ||
# Copyright (c) 2022, 2023, 2024 Genome Research Ltd. | ||
# | ||
# Authors: | ||
# Marina Gourtovaia <[email protected]> | ||
|
@@ -21,6 +21,7 @@ | |
|
||
import logging | ||
from datetime import date, datetime, timedelta | ||
from statistics import mean, stdev | ||
from typing import ClassVar, List | ||
|
||
from pydantic import BaseModel, ConfigDict, Field | ||
|
@@ -33,11 +34,13 @@ | |
) | ||
from lang_qc.db.mlwh_schema import PacBioRunWellMetrics | ||
from lang_qc.db.qc_schema import QcState, QcStateDict, QcType | ||
from lang_qc.models.pacbio.qc_data import QCPoolMetrics, SampleDeplexingStats | ||
from lang_qc.models.pacbio.well import PacBioPagedWells, PacBioWellSummary | ||
from lang_qc.models.pager import PagedResponse | ||
from lang_qc.models.qc_flow_status import QcFlowStatusEnum | ||
from lang_qc.models.qc_state import QcState as QcStateModel | ||
from lang_qc.util.errors import EmptyListOfRunNamesError, RunNotFoundError | ||
from lang_qc.util.type_checksum import PacBioWellSHA256 | ||
|
||
""" | ||
This package is using an undocumented feature of Pydantic, type | ||
|
@@ -64,7 +67,7 @@ class WellWh(BaseModel): | |
# The TestClient seems to be keeping these instances alive and changing them. | ||
|
||
def get_mlwh_well_by_product_id( | ||
self, id_product: str | ||
self, id_product: PacBioWellSHA256 | ||
) -> PacBioRunWellMetrics | None: | ||
""" | ||
Returns a well row record from the well metrics table or | ||
|
@@ -77,6 +80,52 @@ def get_mlwh_well_by_product_id( | |
) | ||
).scalar_one_or_none() | ||
|
||
def get_metrics_by_well_product_id( | ||
self, id_product: PacBioWellSHA256 | ||
) -> QCPoolMetrics | None: | ||
well = self.get_mlwh_well_by_product_id(id_product) | ||
if well and well.demultiplex_mode and "Instrument" in well.demultiplex_mode: | ||
|
||
product_metrics = well.pac_bio_product_metrics | ||
lib_lims_data = [ | ||
product.pac_bio_run | ||
for product in product_metrics | ||
if product.pac_bio_run is not None | ||
] | ||
if len(lib_lims_data) != len(product_metrics): | ||
raise Exception("Partially linked LIMS data or no linked LIMS data") | ||
|
||
cov: float | None | ||
if any(p.hifi_num_reads is None for p in product_metrics): | ||
cov = None | ||
else: | ||
hifi_reads = [prod.hifi_num_reads for prod in product_metrics] | ||
cov = stdev(hifi_reads) / mean(hifi_reads) * 100 | ||
|
||
sample_stats = [] | ||
for (i, prod) in enumerate(product_metrics): | ||
sample_stats.append( | ||
SampleDeplexingStats( | ||
id_product=prod.id_pac_bio_product, | ||
tag1_name=lib_lims_data[i].tag_identifier, | ||
tag2_name=lib_lims_data[i].tag2_identifier, | ||
deplexing_barcode=prod.barcode4deplexing, | ||
hifi_read_bases=prod.hifi_read_bases, | ||
hifi_num_reads=prod.hifi_num_reads, | ||
hifi_read_length_mean=prod.hifi_read_length_mean, | ||
hifi_bases_percent=prod.hifi_bases_percent, | ||
percentage_total_reads=( | ||
prod.hifi_num_reads / well.hifi_num_reads * 100 | ||
if (well.hifi_num_reads and prod.hifi_num_reads) | ||
else None | ||
), | ||
) | ||
) | ||
|
||
return QCPoolMetrics(pool_coeff_of_variance=cov, products=sample_stats) | ||
|
||
return None | ||
|
||
def recent_completed_wells(self) -> List[PacBioRunWellMetrics]: | ||
""" | ||
Get recent not QC-ed completed wells from the mlwh database. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# Copyright (c) 2022, 2023 Genome Research Ltd. | ||
# Copyright (c) 2022, 2023, 2024 Genome Research Ltd. | ||
# | ||
# Authors: | ||
# Marina Gourtovaia <[email protected]> | ||
|
@@ -23,6 +23,7 @@ | |
from pydantic import BaseModel, ConfigDict, Field | ||
|
||
from lang_qc.db.mlwh_schema import PacBioRunWellMetrics | ||
from lang_qc.util.type_checksum import PacBioProductSHA256 | ||
|
||
|
||
# Pydantic prohibits us from defining these as @classmethod or @staticmethod | ||
|
@@ -153,3 +154,32 @@ def from_orm(cls, obj: PacBioRunWellMetrics): | |
qc_data[name]["value"] = getattr(obj, name, None) | ||
|
||
return cls.model_validate(qc_data) | ||
|
||
|
||
class SampleDeplexingStats(BaseModel): | ||
""" | ||
A representation of metrics for one product, some direct from the DB and others inferred | ||
|
||
For a long time tag2_name was null and tag1_name was silently used at both ends of the sequence. | ||
As a result tag2_name will be None for most data in or before 2024. | ||
""" | ||
|
||
id_product: PacBioProductSHA256 | ||
tag1_name: str | None | ||
tag2_name: str | None | ||
mgcam marked this conversation as resolved.
Show resolved
Hide resolved
|
||
deplexing_barcode: str | None | ||
hifi_read_bases: int | None | ||
hifi_num_reads: int | None | ||
hifi_read_length_mean: float | None | ||
hifi_bases_percent: float | None | ||
percentage_total_reads: float | None | ||
|
||
|
||
class QCPoolMetrics(BaseModel): | ||
pool_coeff_of_variance: float | None = Field( | ||
title="Coefficient of variance for reads in the pool", | ||
description="Percentage of the standard deviation w.r.t. mean, when pool is more than one", | ||
) | ||
products: list[SampleDeplexingStats] = Field( | ||
title="List of products and their metrics" | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other QC metrics in
lang_qc/models/pacbio/qc_data.py
have class methods to self-populate themselves. It might be reasonable to move this code to such class method