Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: update to vrs 2.0 models #166

Merged
merged 89 commits into from
Aug 2, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
442083c
build!: remove vrsatile
katiestahl Jul 17, 2024
4a538bb
wip: remove gene descriptor
katiestahl Jul 17, 2024
c0e8626
wip: remove gene descriptor
katiestahl Jul 17, 2024
24e1a4c
progress updating models and adding back gene element wrapper
katiestahl Jul 17, 2024
3e52ff2
adding back gene element
katiestahl Jul 17, 2024
12ee931
Revert "progress updating models and adding back gene element wrapper"
katiestahl Jul 17, 2024
6120473
Revert "adding back gene element"
katiestahl Jul 17, 2024
6573780
converting descriptors
katiestahl Jul 18, 2024
44e0574
remove todo
katiestahl Jul 18, 2024
7796732
wip: adding back GeneElement wrapper, updating to camelCase, removing…
katiestahl Jul 18, 2024
c67d588
updating models
katiestahl Jul 18, 2024
6ad8e33
fix: gene element type
katiestahl Jul 18, 2024
c1e8fad
wip: update constructors with updated param names from models
katiestahl Jul 18, 2024
42d1224
Merge branch 'main' into issue-95-take2
katiestahl Jul 18, 2024
7eabc25
update constructors from model changes
katiestahl Jul 18, 2024
6cd7cfa
Merge branch 'issue-95-take2' of https://github.com/cancervariants/fu…
katiestahl Jul 18, 2024
1e18144
minor fixes
katiestahl Jul 18, 2024
76ef031
fix: updating variable casing
katiestahl Jul 18, 2024
6f740e9
updating docstring
katiestahl Jul 18, 2024
c928b35
fix: variable casing and error messages
katiestahl Jul 18, 2024
a2d2e10
revert featureId back to string
katiestahl Jul 18, 2024
fe85297
Update src/fusor/models.py
katiestahl Jul 19, 2024
eb9da54
Update pyproject.toml
katiestahl Jul 19, 2024
bbae4bc
Update pyproject.toml
katiestahl Jul 19, 2024
7634327
Update src/fusor/models.py
katiestahl Jul 19, 2024
a593c3b
fixes from pr comments
katiestahl Jul 19, 2024
1c3959b
fixes from pr comments
katiestahl Jul 19, 2024
c8c90a7
Merge branch 'issue-95-take2' of https://github.com/cancervariants/fu…
katiestahl Jul 19, 2024
838e2a2
wip: updating test examples with new models
katiestahl Jul 19, 2024
f5f5689
adding back unreachable else because ruff will complain otherwise
katiestahl Jul 19, 2024
c3136ce
fix: update example models with placeholders for sequence location an…
katiestahl Jul 19, 2024
686b000
fix: casing for data to/from cool-seq-tool
katiestahl Jul 22, 2024
1ce7f23
Update src/fusor/fusor.py
katiestahl Jul 22, 2024
485035c
Update src/fusor/fusor.py
katiestahl Jul 22, 2024
9f1ee60
fix: minimal gene response when creating gene
katiestahl Jul 22, 2024
5bbc77c
fix: naming
katiestahl Jul 22, 2024
5a8bea2
Update src/fusor/models.py
katiestahl Jul 22, 2024
ca97171
Update src/fusor/models.py
katiestahl Jul 22, 2024
92fe98f
Update src/fusor/models.py
katiestahl Jul 22, 2024
43f347f
Update src/fusor/models.py
katiestahl Jul 22, 2024
dba14ca
Update src/fusor/fusor.py
katiestahl Jul 22, 2024
a3e67d1
Update src/fusor/models.py
katiestahl Jul 22, 2024
74db34c
updating constructor for SequenceLocation and adding SequenceReference
katiestahl Jul 22, 2024
9cd280c
Merge branch 'issue-95-take2' of https://github.com/cancervariants/fu…
katiestahl Jul 22, 2024
cce6159
removing comment
katiestahl Jul 22, 2024
58b0899
wip: start updates to nomenclature using new models
katiestahl Jul 22, 2024
634ad7f
wip: progress on sequence location constructor
katiestahl Jul 22, 2024
7334ac8
fix: tests and add sequence location id
katiestahl Jul 22, 2024
9d08adf
wip: update test examples
katiestahl Jul 22, 2024
fc649b5
removing incorrect test cases- adding placeholders for now
katiestahl Jul 22, 2024
1e0c7df
fix constructing sequence location
katiestahl Jul 22, 2024
33e932b
fix: casing for sequencelocation
katiestahl Jul 22, 2024
091e23a
updating sequence locations examples
katiestahl Jul 22, 2024
1fbee7c
updating tests
katiestahl Jul 22, 2024
c873d6a
updating tests and adding option to getch gene id from alternate field
katiestahl Jul 22, 2024
ac72ecd
fix: json schema examples
katiestahl Jul 22, 2024
3a9c20a
wip: updating fusor tests
katiestahl Jul 22, 2024
ab3de62
update nomenclature to use new models
katiestahl Jul 22, 2024
6bf5f01
remove completed todo
katiestahl Jul 22, 2024
b952c9c
wip: updating fusor tests
katiestahl Jul 23, 2024
16def6b
Update locations for mane transcript segment fixture/tests
jarbesfeld Jul 23, 2024
a624de5
wip: update tests with new models
katiestahl Jul 23, 2024
620dc1a
Merge branch 'issue-95-take2' of https://github.com/cancervariants/fu…
katiestahl Jul 24, 2024
6a6211b
update tests and examples with new models
katiestahl Jul 24, 2024
3fd9615
update tests and examples with new models
katiestahl Jul 24, 2024
9af8c72
update tests and examples with new models
katiestahl Jul 24, 2024
871c3d9
update tests and examples with new models
katiestahl Jul 24, 2024
01ef722
refactor: moving around logic to make more readable
katiestahl Jul 24, 2024
b811599
model updates
katiestahl Jul 24, 2024
812a21b
Merge branch 'main' into issue-95-take2
katiestahl Jul 24, 2024
9c083b9
model updates
katiestahl Jul 24, 2024
db4d6cb
updating models and tests
katiestahl Jul 24, 2024
6c3b663
fix name
katiestahl Jul 24, 2024
c78ee2a
pin gene normalizer version where CURIE is still defined
katiestahl Jul 25, 2024
fb05793
updating json schema examples for models, removing labael from sequen…
katiestahl Jul 25, 2024
2572534
remove sequencelocation label
katiestahl Jul 25, 2024
62745b8
fix ruff errors
katiestahl Jul 25, 2024
8732135
pinning pydantic version to stop validation error in tests
katiestahl Jul 25, 2024
acab073
test: updating test to fail with unexpected sequence id provided
katiestahl Jul 25, 2024
a0a4529
Update test fixtures for correct use of start and end"
jarbesfeld Jul 25, 2024
f1b82a0
update comment
katiestahl Jul 25, 2024
e9a70b1
Revert "Update test fixtures for correct use of start and end""
katiestahl Jul 25, 2024
dd3f7c1
Update src/fusor/models.py
katiestahl Jul 29, 2024
774f559
Update src/fusor/fusor.py
katiestahl Jul 29, 2024
8fd8d3f
Update src/fusor/fusor.py
katiestahl Jul 29, 2024
c664b77
fix: example data
katiestahl Jul 29, 2024
f00ddb2
Merge branch 'issue-95-take2' of https://github.com/cancervariants/fu…
katiestahl Jul 29, 2024
2880e18
update fusion constructor to accept the body of a valid fusion (same …
katiestahl Jul 30, 2024
b9bff00
Revert "update fusion constructor to accept the body of a valid fusio…
katiestahl Jul 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,9 @@ description = "Computable object representation and validation for gene fusions"
license = {file = "LICENSE"}
dependencies = [
"pydantic == 2.*",
"ga4gh.vrsatile.pydantic ~=0.2.0",
"ga4gh.vrs ~=0.8.1",
"ga4gh.vrs ~=2.0.0a8",
katiestahl marked this conversation as resolved.
Show resolved Hide resolved
"biocommons.seqrepo",
"gene-normalizer ~=0.1.40-dev1",
"gene-normalizer ~=0.4.0",
katiestahl marked this conversation as resolved.
Show resolved Hide resolved
"cool-seq-tool ~=0.5.0",
]
dynamic=["version"]
Expand Down
60 changes: 23 additions & 37 deletions src/fusor/fusor.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,13 @@
from cool_seq_tool.app import CoolSeqTool
from cool_seq_tool.schemas import ResidueMode
from ga4gh.core import ga4gh_identify
from ga4gh.core.domain_models import Gene
from ga4gh.vrs import models
from ga4gh.vrsatile.pydantic.vrs_models import (
CURIE,
Number,
SequenceInterval,
SequenceLocation,
VRSTypes,
)
from ga4gh.vrsatile.pydantic.vrsatile_models import GeneDescriptor, LocationDescriptor
from ga4gh.vrs.models import SequenceLocation
from gene.database import AbstractDatabase as GeneDatabase
from gene.database import create_db
from gene.query import QueryHandler
from gene.schemas import CURIE
from pydantic import ValidationError

from fusor.exceptions import FUSORParametersException, IDTranslationException
Expand Down Expand Up @@ -100,7 +95,7 @@ def _contains_element_type(kwargs: dict, elm_type: StructuralElementType) -> boo
def fusion(self, fusion_type: FusionType | None = None, **kwargs) -> Fusion:
"""Construct fusion object.

:param fusion_type: explicitly specify fusion type. Unecessary if providing
:param fusion_type: explicitly specify fusion type. Unnecessary if providing
fusion object in keyword args that includes ``type`` attribute.
:return: constructed fusion object if successful
:raise: FUSORParametersException if fusion type unable to be determined,
Expand Down Expand Up @@ -292,7 +287,7 @@ async def transcript_segment_element(
exon_end=genomic_data.exon_end,
exon_end_offset=genomic_data.exon_end_offset,
gene_descriptor=normalized_gene_response[0],
element_genomic_start=self._location_descriptor(
element_genomic_start=self._sequence_location(
genomic_data.start,
genomic_data.start + 1,
genomic_data.chr,
Expand All @@ -301,7 +296,7 @@ async def transcript_segment_element(
)
if genomic_data.start
else None,
element_genomic_end=self._location_descriptor(
element_genomic_end=self._sequence_location(
genomic_data.end,
genomic_data.end + 1,
genomic_data.chr,
Expand Down Expand Up @@ -361,7 +356,7 @@ def templated_sequence_element(
if residue_mode == ResidueMode.RESIDUE:
start -= 1

region = self._location_descriptor(
region = self._sequence_location(
start,
end,
sequence_id,
Expand Down Expand Up @@ -468,7 +463,7 @@ def functional_domain(
if not gene_descr:
return None, warning

loc_descr = self._location_descriptor(
loc_descr = self._sequence_location(
start, end, sequence_id, seq_id_target_namespace=seq_id_target_namespace
)

Expand Down Expand Up @@ -519,15 +514,15 @@ def regulatory_element(
_logger.warning(msg)
return None, msg

def _location_descriptor(
def _sequence_location(
self,
start: int,
end: int,
sequence_id: str,
label: str | None = None,
seq_id_target_namespace: str | None = None,
use_location_id: bool = False,
) -> LocationDescriptor:
) -> SequenceLocation:
"""Create location descriptor

:param start: Start position
Expand Down Expand Up @@ -564,22 +559,11 @@ def _location_descriptor(
else:
sequence_id = seq_id

location = SequenceLocation(
return SequenceLocation(
sequence_id=sequence_id,
interval=SequenceInterval(start=Number(value=start), end=Number(value=end)),
start=start, end=end,
)

if use_location_id:
_id = self._location_id(location.model_dump())
else:
quote_id = quote(label) if label else quote(seq_id_input)
_id = f"fusor.location_descriptor:{quote_id}"

location_descr = LocationDescriptor(id=_id, location=location)

if label:
location_descr.label = label
return location_descr

def add_additional_fields(
self,
Expand Down Expand Up @@ -636,7 +620,7 @@ def add_location_id(self, fusion: Fusion) -> Fusion:
]:
if element_genomic:
location = element_genomic.location
if location.type == VRSTypes.SEQUENCE_LOCATION.value:
if location.type == SequenceLocation:
location_id = self._location_id(location.model_dump())
element_genomic.location_id = location_id
if isinstance(fusion, CategoricalFusion) and fusion.critical_functional_domains:
Expand All @@ -648,7 +632,7 @@ def add_location_id(self, fusion: Fusion) -> Fusion:
element = fusion.regulatory_element
if element.feature_location:
location = element.feature_location
if location.type == VRSTypes.SEQUENCE_LOCATION.value:
if location.type == SequenceLocation:
location_id = self._location_id(location.model_dump())
element.feature_location.location_id = location_id
return fusion
Expand All @@ -674,7 +658,7 @@ def add_translated_sequence_id(
for element in fusion.structural_elements:
if isinstance(element, TemplatedSequenceElement):
location = element.region.location
if location.type == VRSTypes.SEQUENCE_LOCATION.value:
if location.type == SequenceLocation:
try:
new_id = translate_identifier(
self.seqrepo, location.sequence_id, target_namespace
Expand All @@ -690,7 +674,7 @@ def add_translated_sequence_id(
]:
if loc_descr:
location = loc_descr.location
if location.type == VRSTypes.SEQUENCE_LOCATION.value:
if location.type == SequenceLocation:
try:
new_id = translate_identifier(
self.seqrepo, location.sequence_id, target_namespace
Expand All @@ -716,6 +700,7 @@ def add_translated_sequence_id(
domain.sequence_location.location.sequence_id = new_id
return fusion

# TODO: should this be adding to the gene extensions or something instead?
katiestahl marked this conversation as resolved.
Show resolved Hide resolved
def add_gene_descriptor(self, fusion: Fusion) -> Fusion:
"""Add additional fields to ``gene_descriptor`` in fusion object

Expand All @@ -730,24 +715,24 @@ def add_gene_descriptor(self, fusion: Fusion) -> Fusion:
for obj in prop:
if "gene_descriptor" in obj.model_fields:
label = obj.gene_descriptor.label
norm_gene_descr, _ = self._normalized_gene_descriptor(
norm_gene_descr, _ = self._normalized_gene(
label, use_minimal_gene_descr=False
)
if norm_gene_descr:
obj.gene_descriptor = norm_gene_descr
if fusion.regulatory_element and fusion.regulatory_element.associated_gene:
reg_el = fusion.regulatory_element
label = reg_el.associated_gene.label
norm_gene_descr, _ = self._normalized_gene_descriptor(
norm_gene_descr, _ = self._normalized_gene(
label, use_minimal_gene_descr=False
)
if norm_gene_descr:
reg_el.associated_gene = norm_gene_descr
return fusion

def _normalized_gene_descriptor(
def _normalized_gene(
self, query: str, use_minimal_gene_descr: bool = True
) -> tuple[GeneDescriptor | None, str | None]:
) -> tuple[Gene | None, str | None]:
"""Return gene descriptor from normalized response.

:param query: Gene query
Expand All @@ -761,7 +746,8 @@ def _normalized_gene_descriptor(
if gene_norm_resp.match_type:
gene_descr = gene_norm_resp.gene_descriptor
if use_minimal_gene_descr:
gene_descr = GeneDescriptor(
# TODO: how to handle gene_id here? add to extensions??
katiestahl marked this conversation as resolved.
Show resolved Hide resolved
gene_descr = Gene(
id=gene_descr.id, gene_id=gene_descr.gene_id, label=gene_descr.label
)
return gene_descr, None
Expand Down
83 changes: 27 additions & 56 deletions src/fusor/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,9 @@
from enum import Enum
from typing import Any, Literal

from ga4gh.vrsatile.pydantic import return_value
from ga4gh.vrsatile.pydantic.vrsatile_models import (
CURIE,
GeneDescriptor,
LocationDescriptor,
SequenceDescriptor,
)
from ga4gh.core.domain_models import Gene
from ga4gh.vrs.models import SequenceLocation
from gene.schemas import CURIE
katiestahl marked this conversation as resolved.
Show resolved Hide resolved
from pydantic import (
BaseModel,
ConfigDict,
Expand All @@ -34,7 +30,8 @@ class FUSORTypes(str, Enum):
TRANSCRIPT_SEGMENT_ELEMENT = "TranscriptSegmentElement"
TEMPLATED_SEQUENCE_ELEMENT = "TemplatedSequenceElement"
LINKER_SEQUENCE_ELEMENT = "LinkerSequenceElement"
GENE_ELEMENT = "GeneElement"
# TODO: I'm not sure if this needs to still be here or not
GENE = "Gene"
UNKNOWN_GENE_ELEMENT = "UnknownGeneElement"
MULTIPLE_POSSIBLE_GENES_ELEMENT = "MultiplePossibleGenesElement"
REGULATORY_ELEMENT = "RegulatoryElement"
Expand Down Expand Up @@ -63,11 +60,12 @@ class FunctionalDomain(BaseModel):

type: Literal[FUSORTypes.FUNCTIONAL_DOMAIN] = FUSORTypes.FUNCTIONAL_DOMAIN
status: DomainStatus
associated_gene: GeneDescriptor
associated_gene: Gene
id: CURIE | None = Field(None, alias="_id")
katiestahl marked this conversation as resolved.
Show resolved Hide resolved
label: StrictStr | None = None
sequence_location: LocationDescriptor | None = None
sequence_location: SequenceLocation | None = None

# TODO: is this obsolete now that vrsatile has been removed?
katiestahl marked this conversation as resolved.
Show resolved Hide resolved
_get_id_val = field_validator("id")(return_value)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wanted to draw attention to these - it looks like everywhere return_value was used was verifying a CURIE. This is no longer needed. (And note to myself: I will make sure this is tested and works properly with the usage of CURIE from gene normalizer schema)


model_config = ConfigDict(
Expand Down Expand Up @@ -107,7 +105,7 @@ class StructuralElementType(str, Enum):
TRANSCRIPT_SEGMENT_ELEMENT = FUSORTypes.TRANSCRIPT_SEGMENT_ELEMENT.value
TEMPLATED_SEQUENCE_ELEMENT = FUSORTypes.TEMPLATED_SEQUENCE_ELEMENT.value
LINKER_SEQUENCE_ELEMENT = FUSORTypes.LINKER_SEQUENCE_ELEMENT.value
GENE_ELEMENT = FUSORTypes.GENE_ELEMENT.value
GENE_ELEMENT = Gene
UNKNOWN_GENE_ELEMENT = FUSORTypes.UNKNOWN_GENE_ELEMENT.value
MULTIPLE_POSSIBLE_GENES_ELEMENT = FUSORTypes.MULTIPLE_POSSIBLE_GENES_ELEMENT.value

Expand All @@ -129,9 +127,9 @@ class TranscriptSegmentElement(BaseStructuralElement):
exon_start_offset: StrictInt | None = 0
exon_end: StrictInt | None = None
exon_end_offset: StrictInt | None = 0
gene_descriptor: GeneDescriptor
element_genomic_start: LocationDescriptor | None = None
element_genomic_end: LocationDescriptor | None = None
gene: Gene
element_genomic_start: SequenceLocation | None = None
element_genomic_end: SequenceLocation | None = None

@model_validator(mode="before")
def check_exons(cls, values):
Expand Down Expand Up @@ -216,7 +214,7 @@ class LinkerElement(BaseStructuralElement, extra="forbid"):
type: Literal[FUSORTypes.LINKER_SEQUENCE_ELEMENT] = (
FUSORTypes.LINKER_SEQUENCE_ELEMENT
)
linker_sequence: SequenceDescriptor
linker_sequence: SequenceLocation

@field_validator("linker_sequence", mode="before")
katiestahl marked this conversation as resolved.
Show resolved Hide resolved
def validate_sequence(cls, v):
Expand All @@ -226,7 +224,7 @@ def validate_sequence(cls, v):
v["sequence"] = v["sequence"].upper()
except KeyError as e:
raise TypeError from e
elif isinstance(v, SequenceDescriptor):
elif isinstance(v, SequenceLocation):
v.sequence = v.sequence.upper()
else:
raise TypeError
Expand Down Expand Up @@ -264,7 +262,7 @@ class TemplatedSequenceElement(BaseStructuralElement):
type: Literal[FUSORTypes.TEMPLATED_SEQUENCE_ELEMENT] = (
FUSORTypes.TEMPLATED_SEQUENCE_ELEMENT
)
region: LocationDescriptor
region: SequenceLocation
strand: Strand

model_config = ConfigDict(
Expand Down Expand Up @@ -292,27 +290,6 @@ class TemplatedSequenceElement(BaseStructuralElement):
)


class GeneElement(BaseStructuralElement):
"""Define Gene Element class."""

type: Literal[FUSORTypes.GENE_ELEMENT] = FUSORTypes.GENE_ELEMENT
gene_descriptor: GeneDescriptor

model_config = ConfigDict(
json_schema_extra={
"example": {
"type": "GeneElement",
"gene_descriptor": {
"id": "gene:BRAF",
"gene_id": "hgnc:1097",
"label": "BRAF",
"type": "GeneDescriptor",
},
}
},
)


class UnknownGeneElement(BaseStructuralElement):
"""Define UnknownGene class. This is primarily intended to represent a
partner in the result of a fusion partner-agnostic assay, which identifies
Expand Down Expand Up @@ -388,8 +365,8 @@ class RegulatoryElement(BaseModel):
type: Literal[FUSORTypes.REGULATORY_ELEMENT] = FUSORTypes.REGULATORY_ELEMENT
regulatory_class: RegulatoryClass
feature_id: str | None = None
associated_gene: GeneDescriptor | None = None
feature_location: LocationDescriptor | None = None
associated_gene: Gene | None = None
feature_location: SequenceLocation | None = None

_get_ref_id_val = field_validator("feature_id")(return_value)

Expand Down Expand Up @@ -614,7 +591,7 @@ class Assay(BaseModelForbidExtra):

AssayedFusionElements = list[
TranscriptSegmentElement
| GeneElement
| Gene
| TemplatedSequenceElement
| LinkerElement
| UnknownGeneElement
Expand Down Expand Up @@ -682,13 +659,10 @@ class AssayedFusion(AbstractFusion):
},
"structural_elements": [
{
"type": "GeneElement",
"gene_descriptor": {
"id": "gene:EWSR1",
"gene_id": "hgnc:3058",
"label": "EWSR1",
"type": "GeneDescriptor",
},
"type": "Gene",
"id": "gene:EWSR1",
"gene_id": "hgnc:3058",
"label": "EWSR1",
},
{"type": "UnknownGeneElement"},
],
Expand All @@ -699,7 +673,7 @@ class AssayedFusion(AbstractFusion):

CategoricalFusionElements = list[
TranscriptSegmentElement
| GeneElement
| Gene
| TemplatedSequenceElement
| LinkerElement
| MultiplePossibleGenesElement
Expand Down Expand Up @@ -781,13 +755,10 @@ class CategoricalFusion(AbstractFusion):
},
},
{
"type": "GeneElement",
"gene_descriptor": {
"id": "gene:ALK",
"type": "GeneDescriptor",
"gene_id": "hgnc:427",
"label": "ALK",
},
"type": "Gene",
"id": "gene:ALK",
"gene_id": "hgnc:427",
"label": "ALK",
},
],
"regulatory_element": {
Expand Down
1 change: 0 additions & 1 deletion src/fusor/nomenclature.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
"""Provide helper methods for fusion nomenclature generation."""

from biocommons.seqrepo.seqrepo import SeqRepo
from ga4gh.vrsatile.pydantic.vrs_models import SequenceLocation

from fusor.exceptions import IDTranslationException
from fusor.models import (
Expand Down
Loading
Loading