Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add examples for Cat-VRS data classes #87

Open
DanielPuthawala opened this issue Dec 5, 2024 · 9 comments
Open

Add examples for Cat-VRS data classes #87

DanielPuthawala opened this issue Dec 5, 2024 · 9 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed nov-24-tu-ballot Issues resulting from the 2024 Trial Use Ballot Discussion

Comments

@DanielPuthawala
Copy link
Collaborator

It was expressed at the last Cat-VRS call that having examples alongside the schemata and docs really helped people new to the project to orient themselves. It was requested that we add additional examples (2 or 3 per class?) that demosntrate different aspects of the schemas.

@DanielPuthawala DanielPuthawala added enhancement New feature or request help wanted Extra attention is needed labels Dec 5, 2024
@brendanreardon
Copy link
Collaborator

brendanreardon commented Dec 12, 2024

A non-exhaustive list of example updates that we want to make:


Originally posted by @korikuzma in #86 (comment):

One thing I really like about the new docs for VRS, is that it now includes examples. I think it would be really beneficial to users to add examples for each class.

@brendanreardon
Copy link
Collaborator

I just noticed that this thread may be a duplicate of #25. Should we close this in favor of the earlier one from Kori?

@korikuzma
Copy link
Contributor

I can close mine!

@brendanreardon
Copy link
Collaborator

brendanreardon commented Dec 17, 2024

I committed hgvs representations to proteinSequenceConsequence-ex1.yaml. Some questions and comments that I have about the examples are...

  • What do people think about not using the shorthand of vrs.json#ga4gh:VA.... for DefiningAlleleConstraint and members? I understand that this is quicker, but since it requires vrs-python to be set up locally to get unhash the id, it may add to confusion as new implementers are trying to get initiated.
  • In the hgvs list extension for canonicalAllele-ex1.yaml, the first nucleotideExpression has "top-level" for the nucleotideType. What does this mean? I couldn't find anything about this in either the va-spec, vrs, or gks documentation either.
  • As an aside, even though the extension field is unstructured, it may be nice to have documentation of why the examples are structured as they are. Regardless of how much documentation is written, in practice the examples are going to serve as templates for the community.
  • All current examples using either DefiningAlleleConstraint or DefiningLocationConstraint list relations within the constraint. As a mappable concept, they should list more than just the primaryCode. Similar to above, it would be nice to have definitions for the primaryCodes being used in the current examples.
  • Since the examples are written in .yaml, how do people feel about adding comments within the files? Specifically to link to relevant documentation for each section of an example. Alternatively, if folks are worried about cluttering the examples, we can add a README.md (or .rst) to the examples folder.

@jsstevenson
Copy link
Contributor

jsstevenson commented Dec 17, 2024

What do people think about not using the shorthand of vrs.json#ga4gh:VA.... for DefiningAlleleConstraint and members? I understand that this is quicker, but since it requires vrs-python to be set up locally to get unhash the id, it may add to confusion as new implementers are trying to get initiated.

We really should have a simple web app for this (generating variation object/hash)

@brendanreardon brendanreardon moved this from Todo to Needs Review in Cat-VRS Dec 18, 2024
@brendanreardon brendanreardon moved this from Needs Review to In Progress in Cat-VRS Dec 18, 2024
@brendanreardon
Copy link
Collaborator

Responses from original post

Originally posted by @afrubin in #86 (reply in thread)

Thank you for this. The examples are super useful for our project, especially when we are trying to engage with only a subset of the spec (Protein Sequence Consequence) at this stage.

On a related note, the recipe example file is super useful, but it would be great if the vrs.json file that is referenced in the constraints and members sections was also part of the repo for completeness.

Originally posted by @DanielPuthawala in #86 (reply in thread)

Like a fully-verbose example that drills down all the way to the GKS.core stuff?

Originally posted by @afrubin in #86 (reply in thread)

I think if one complete example was available it would help implementers, since it would be easier to see what CatVRS requires in one place. Having said that, it might be a lot of work if you don't have those files already so it might be out of scope for this TU ballot.

@brendanreardon brendanreardon added the nov-24-tu-ballot Issues resulting from the 2024 Trial Use Ballot Discussion label Jan 23, 2025
@brendanreardon
Copy link
Collaborator

brendanreardon commented Jan 23, 2025

Here is the exchange re: creating a DefiningLocation constraint example that starts and ends with ranges

Originally posted by @rhdolin in #86 (comment)

Example shows a location constraint with start and end integers, but should probably be start and end ranges.

Responses to the original thread:

Originally posted by @DanielPuthawala in #86 (reply in thread)

We're currently following VRS conventions, where locations typically contain start and end ranges, but for which unambiguous ranges can be replaced with an integer, so eg:

Location((4,4),(10,10)) = Location (4,10)

In the past I pushed for always using explicit ranges, and the group pushed back on the grounds that (1) this allows for shorter variant entries, (2) that it's not unduly ambiguous, and (3) that GKS already has the tools to support parsing integers as unambiguous ranges, so it wasn't an implementation issue.

Originally posted by @rhdolin in #86 (reply in thread)

Thanks @DanielPuthawala , I see your point. But for CNVs, isn't it often the case that, say, inner endpoints are known whereas outer endpoints are not? If I'm interpreting the example correctly, it seems to suggest that we know the precise endpoints for this CNV.

Originally posted by @DanielPuthawala in #86 (reply in thread)

Ah, no, it can handle those cases, too. (this is all inherited from VRS)
So let's say we have a range with endpoints A and B.
A and B are each ranges (so the full range is actually a tuple of ranges):
((A1, A2),(B1, B2))
Where A1, A2, B1, and B2 can either be integers or a null value (here I'll use "?" just for readibility)

So a really specific could be: ((4,4),(10,10))
Which as I said above by convention is evaluated to mean the same thing as (4,10)

Here is a range where the outermost possible endpoints are known, but the innermost endpoints are not known: ((4,?),(?,10))
any subsequence including or between 4 and 10 are possible in this range, including 4-10, 4-5, 6-8, 7-7, etc.

Here is a range where the innermost endpoints are known, but the outermost possible endpoints are not known: ((?,4),(10,?))
Any sequence that covers at least positions 4-10 are possible given this range, including 4-10, 2-10, 4-15, 3-20, etc.

And here's a range that is maximally vague: ((?,?),(?,?)), which admits any sequence, regardless of length or position.

Does that make sense to you?

Originally posted by @rhdolin in #86 (reply in thread)

Thanks @DanielPuthawala. Yes, makes sense. My question really was about the specific example - I haven't seen CNVs with precise endpoints, so I would have expected the example to have ranges.

@brendanreardon
Copy link
Collaborator

brendanreardon commented Feb 5, 2025

Is it intentional that the DefiningAlleleConstraint details in examples/canonicalAllele-ex1.yaml is identical to the allele stated in members? This makes sense, I think, because the constraint is used for matching to other categorical variants while members is being used to compare to other VRS Variations, but it still feels a bit redundant.

Edit: On 2025-02-05 Daniel confirmed that this is intentional, but sees the point.

@brendanreardon
Copy link
Collaborator

During the 2025-02-05 meeting the group was encouraged to come up with examples that include closeMatch, broadMatch, and narrowMatch conceptMappings for the relation field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed nov-24-tu-ballot Issues resulting from the 2024 Trial Use Ballot Discussion
Projects
Status: In Progress
Development

No branches or pull requests

4 participants