Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update examples to VRS 2.0 #151

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
208 changes: 114 additions & 94 deletions src/fusor/examples/bcr_abl1.json
Original file line number Diff line number Diff line change
@@ -1,111 +1,131 @@
{
"type": "CategoricalFusion",
"structural_elements": [
{
"type": "TranscriptSegmentElement",
"transcript": "refseq:NM_004327.3",
"gene_descriptor": {
"type": "GeneDescriptor",
"id": "normalize.gene:BCR",
"gene_id": "hgnc:1014",
"label": "BCR"
"structure": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Structure field should be a VRS Adjacency (for transcript junction descriptions) or a VRS DerivativeMolecule (under development; use for full fusion transcript representations). The former are much more common, as illustrated in this BCR::ABL1 example.

"type": "Adjacency",
"adjoinedSequences": [{
"type": "SequenceLocation",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The adjoinedSequence SequenceLocation objects for the adjacency are based on genomic coordinates corresponding to the fusion junction; for categorical fusions and many reported assayed fusions, these are typically at (or close to) transcript boundaries as aligned to a chromosome.

"sequenceReference": {
"id": "GRCh38:chr22",
"type": "SequenceReference",
"refgetAccession": "SQ.7B7SHsmchAR0dFcDCuSFjJAo7tX87krQ",
"residueAlphabet": "na"
},
"element_genomic_end": {
"id": "fusor.location_descriptor:NC_000022.11",
"type": "LocationDescriptor",
"label": "NC_000022.11",
"location": {
"type": "SequenceLocation",
"sequence_id": "refseq:NC_000022.11",
"interval": {
"type": "SequenceInterval",
"start": {
"type": "Number",
"value": 23253980
},
"end": {
"type": "Number",
"value": 23253981
"end": 23290413,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that here we use end (appropriately) but the 5' partner expressed in genomic coordinates should use start if the transcribed sequence is on the negative strand. Using end represents a boundary that includes sequence to the left (here, the sequence left of the aligned genomic coordinates at position 23290413, corresponding to the BCR exon 14).

image

"extensions": [
{
"name": "NM_004327.4:e._14",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extension to the genomic coordinate shows the representation of the transcript segment boundary in terms of the VICC exon representation.

"description": "VICC exon representation of the aligned transcript boundary.",
"value": {
"exon_end": 14,
"exon_end_offset": 0,
"sequenceReference":{
"type": "SequenceReference",
"id": "NM_004327.4",
"refgetAccession": "SQ.kpytJsXw3BwLC3oBSjHQS1kwxs4WO3I3",
"residueAlphabet": "na"
}
}
}
},
"exon_end": 2,
"exon_end_offset": 182
},
{
"type": "LinkerSequenceElement",
"linker_sequence": {
"id": "sequence:ACTAAAGCG",
"type": "SequenceDescriptor",
"sequence": "ACTAAAGCG",
"residue_type": "SO:0000348"
}
},
{
"type": "TranscriptSegmentElement",
"transcript": "refseq:NM_005157.5",
"exon_start": 2,
"exon_start_offset": -173,
"gene_descriptor": {
"id": "normalize.gene:ABL1",
"type": "GeneDescriptor",
"label": "ABL1",
"gene_id": "hgnc:76"
},
"element_genomic_start": {
"id": "fusor.location_descriptor:NC_000009.12",
"type": "LocationDescriptor",
"label": "NC_000009.12",
"location": {
"type": "SequenceLocation",
"sequence_id": "refseq:NC_000009.12",
"interval": {
"type": "SequenceInterval",
"start": {
"type": "Number",
"value": 130854064
},
{
"name": "NM_004327.4:c._2782",
"description": "Transcript SequenceLocation of the aligned transcript boundary.",
"value": {
"type": "SequenceLocation",
"sequenceReference": {
"id": "NM_004327.4",
"type": "SequenceReference",
"refgetAccession": "SQ.kpytJsXw3BwLC3oBSjHQS1kwxs4WO3I3",
"residueAlphabet": "na"
},
"end": {
"type": "Number",
"value": 130854065
}
"end": 3234
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we again use end, as we always expect to see in the transcript sequence representation for the 5' partner. For transcripts aligning to the negative strand of a chromsome, this would remain end, even though the chromosome representation (on line 13) would use start.

}
},
{
"name": "gene",
"description": "The gene concept (BCR) associated with this fusion partner.",
"value": {
"code": "hgnc:1014",
"system": "https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/",
"label": "BCR"
Comment on lines +47 to +49
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a GKS Coding object.

}
}
}
]},
{
"type": "SequenceLocation",
"sequenceReference": {
"id": "GRCh38:chr9",
"type": "SequenceReference",
"refgetAccession": "SQ.KEO-4XBcm1cxeo_DIQ8_ofqGUkp4iZhI",
"residueAlphabet": "na"
},
"start": 130854064,
"extensions": [
{
"name": "NM_005157.6:e.2_",
"description": "VICC exon representation of the aligned transcript boundary.",
"value": {
"exon_start": 2,
"exon_start_offset": 0,
"sequenceReference":{
"id": "NM_005157.6",
"type": "SequenceReference",
"refgetAccession": "SQ.w8Qg3x-PQ2akJrJQeGEN-_eBUMo1H1CL",
"residueAlphabet": "na"
}
}
},
{
"name": "NM_005157.6:c.80_",
"description": "Transcript SequenceLocation of the aligned transcript boundary.",
"value": {
"type": "SequenceLocation",
"sequenceReference": {
"id": "NM_005157.6",
"type": "SequenceReference",
"refgetAccession": "SQ.w8Qg3x-PQ2akJrJQeGEN-_eBUMo1H1CL",
"residueAlphabet": "na"
},
"end": 273
}
},
{
"name": "gene",
"description": "The gene concept (ABL1) associated with this fusion partner.",
"value": {
"code": "hgnc:76",
"system": "https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/",
"label": "ABL1"
}
}
]
}],
"linker": {
"type": "LiteralSequenceExpression",
"sequence": "CCCGTC"
}
],
"r_frame_preserved": true,
"critical_functional_domains": [
},
"readingFramePreserved": true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using camelCase, similar to usage in GKS specs

"criticalFunctionalDomains": [
{
"type": "FunctionalDomain",
"status": "preserved",
"associated_gene": {
"id": "normalize.gene:hgnc%3A76",
"type": "GeneDescriptor",
"label": "ABL1",
"gene_id": "hgnc:76"
"gene": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly noting this to help myself as I am working on the updates to fusor - this should be associatedGene

"code": "hgnc:76",
"system": "https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/",
"label": "ABL1"
},
"_id": "interpro:IPR000980",
"id": "interpro:IPR000980",
"label": "SH2 domain",
"sequence_location": {
"id": "fusor.location_descriptor:NP_005148.2",
"type": "LocationDescriptor",
"location": {
"type": "SequenceLocation",
"sequence_id": "refseq:NP_005148.2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we retain this identifier as a mapping, or something of that nature?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so? I was more following the convention in element_genomic_end and element_genomic_start where I removed the identifier

"interval": {
"type": "SequenceInterval",
"start": {
"type": "Number",
"value": 127
},
"end": {
"type": "Number",
"value": 202
}
}
}
"sequenceLocation": {
"type": "SequenceLocation",
"sequenceReference": {
"id": "GRCh38:chr22",
"type": "SequenceReference",
"refgetAccession": "SQ.7B7SHsmchAR0dFcDCuSFjJAo7tX87krQ",
"residueAlphabet": "na"
},
"start": 127,
"end": 202
}
}
]
Expand Down
52 changes: 15 additions & 37 deletions src/fusor/examples/tpm3_ntrk1.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,62 +6,40 @@
"transcript": "refseq:NM_152263.3",
"exon_end": 8,
"exon_end_offset": 0,
"gene_descriptor": {
"gene": {
"id": "normalize.gene:TPM3",
"type": "GeneDescriptor",
"type": "Gene",
"label": "TPM3",
"gene_id": "hgnc:12012"
},
"element_genomic_end": {
"id": "fusor.location_descriptor:NC_000001.11",
"type": "LocationDescriptor",
"label": "NC_000001.11",
"location": {
"type": "SequenceLocation",
"sequence_id": "refseq:NC_000001.11",
"interval": {
"type": "SequenceInterval",
"start": {
"type": "Number",
"value": 154170399
},
"end": {
"type": "Number",
"value": 154170400
}
"sequenceReference": {
"type": "SequenceReference",
"refgetAccession": "SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO",
"residueAlphabet": null,
"start": 154170399
}
}
}
},
{
"type": "TranscriptSegmentElement",
"transcript": "refseq:NM_002529.3",
"exon_start": 10,
"exon_start_offset": 0,
"gene_descriptor": {
"gene": {
"id": "normalize.gene:NTRK1",
"type": "GeneDescriptor",
"type": "Gene",
"label": "NTRK1",
"gene_id": "hgnc:8031"
},
"element_genomic_start": {
"id": "fusor.location_descriptor:NC_000001.11",
"type": "LocationDescriptor",
"label": "NC_000001.11",
"location": {
"type": "SequenceLocation",
"sequence_id": "refseq:NC_000001.11",
"interval": {
"type": "SequenceInterval",
"start": {
"type": "Number",
"value": 156874626
},
"end": {
"type": "Number",
"value": 156874627
}
}
"type": "SequenceLocation",
"sequenceReference": {
"type": "SequenceReference",
"refgetAccession": "SQ.Ya6Rs7DHhDeg7YaOSg1EoNi3U_nQ9SvO",
"residueAlphabet": null,
"start": 156874626
}
}
}
Expand Down
Loading