Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Patterns for SO/MSO #1

Open
msinclair2 opened this issue Oct 8, 2017 · 10 comments
Open

Design Patterns for SO/MSO #1

msinclair2 opened this issue Oct 8, 2017 · 10 comments

Comments

@msinclair2
Copy link
Owner

@dosumis

In discussion with @keilbeck, we realized that there will be some classes in both the SO and MSO that will have no counterpart in the other ontology. For example, in the MSO, there will be some sequence molecules that either contain no information or for which the information is not yet known. In the SO, there will be some arbitrary artifacts that are not easy to associate with discrete sequence molecules or regions of molecules, such as assemblies and contigs. It won't be possible, then, to generate one ontology from the other in toto, though it should be possible to generate the majority of classes of one ontology from the other. @mikebada has already done a lot of work creating an independent MSO that is BFO-compliant, that integrates with ChEBI, and that annotates what the counterpart for each class is in the SO if known. And he has done so largely using logical axioms, rather than named classes, relying on the reasoner to classify the taxonomy; that implies a design pattern we can extract. I need to discuss this more in detail with him.

Most(all?) of the entities in SO are information entities. In BFO terms, they are generically dependent continuants. SO has 4 top-level classes, all of which would fall under BFO's generically dependent continuant (I think).

One of these, "sequence_attribute", are "attributes describing a quality of sequence". These are abstract qualities of sequence entities. They inhere in specific qualities of specific molecules described in the MSO. The relation between a generically dependent continuant, like the standard text of a novel, and a specifically dependent continuant, like the color of the ink it's printed in in a particular copy of the book, is "is concretized by". I think this relation would hold between sequence_attributes in SO and qualities in MSO. So I'm ready to sketch out a first design pattern:

  1. abstract sequence quality <---> specific sequence quality
    To infer MSO classes from SO classes using this pattern, we require that all subclasses of "sequence_attribute" (domain) must be conretized by a corresponding MSO class that is a subclass of BFO "quality" (range).

The other three top level classes of SO (sequence_collection, sequence_feature, and sequence_variant) describe the information (e.g. recognition sites, reference genomes) that inhere in actual sequence molecules as independent continuants in MSO. An anology would be the standard text of a novel and a book, or the print in a book, capable of bearing that text. We can describe now a second design pattern, which will cover the majority of SO and MSO classes:

  1. genomic information <---> specific sequence molecule
    To infer MSO classes from SO classes using this pattern, we require that all subclasses of "sequence_collection, sequence_feature, and sequence_variant) (domain) must be generically_dependent_on a corresponding MSO class that is a subclass of ChEBI "chemical entity".

Some SO classes describe the recognition sites of boundaries between regions. In this case, these SO classes would inhere rather in boundary entities, which are "immaterial entities in BFO. So we have yet another pattern:

  1. genomic boundary information <---> specific sequence boundary
    To infer MSO classes from SO classes using this pattern, we require that all subclasses of "junction" which is a subclass of "sequence_feature" (domain) must be generically_dependent_on a corresponding MSO class that is a subclass of BFO's "immaterial entity" (range).

From these the majority of classes that should belong to BFO will be generated from SO. We need further a way to infer from SO the key annotations (such as rdfs:label and definition) for a minimally sufficient class definition for the MSO classes. These patterns will also tell us what upper level BFO or ChEBI class the MSO classes are subclasses of, to begin reconstructing the taxonomy of MSO.

A fourth design pattern, to complete the full taxonomic reconstruction of MSO, can in consultation with @mikebada be extracted from the work he has already done.

There will need to be two additional templates:

  1. Generating the subontology of classes that exist only in MSO with no relation to any SO class, and

  2. Generating the subontology of classes that exist only in SO with no relation to any MSO class.

An additional design principle we have adopted for convenience is:

The principle of identical 7 digit IDs
Classes in SO and MSO that are connected by a "generically depends on" or "is concretized by" relation should have the exact same 7 digit ID number, just with different prefixes (SO_ vs MSO_) to make programmatic IRI identification of counterparts much easier.

For the addition of future terms:

Any time a new term in SO is added that requires generic dependence on some bearer, an MSO counterpart term with the same 7-digit ID but different namespace must be generated at the same time, and its precise place in the MSO taxonomy decided upon. On the other hand, the only time a new MSO term should be created independently of SO is when some molecular feature with no known genomic annotation or informational importance is discovered and needs to be described.

I am new to this discussion, and have not been in the loop of all the earlier conversations in the community about this project and what the community wants and needs for practical purposes. I value all feedback not just on the theoretical aspects of design templates, and what steps to take next, but also the practical aspects of fulfilling what the community needs and not including superfluous features.

@dosumis
Copy link

dosumis commented Oct 10, 2017

Hi Michael,

Rather a lot to take in here. A few quick comments:

The principle of identical 7 digit IDs
Classes in SO and MSO that are connected by a "generically depends on" or "is concretized by" relation should have the exact same 7 digit ID number, just with different prefixes (SO_ vs MSO_) to make programmatic IRI identification of counterparts much easier.

I would strongly advise not relying on this. The whole point of numeric IDs is that they are completely free of semantics. It's a heavy burden on maintenance to make sure IDs always line up. One ID in the wrong place can mess it up. If you merge or obsolete in one ontology you always have to do so in the other. There's a strong temptation to build software that relies on ID mapping & then you have the burden for ever. And you may decide that there are cases which are not 1:1 SO:MSO. I say all this as someone who's been burned by this in the past.

The mapping also doesn't make sense without equivalentTo axioms connecting SO and MSO. SubClassOf axioms will be true for all subclasses.

  1. As most of the classification hierarchy of MSO and SO will be identical it is essential that you come up with patterns to infer classification in one from classification in the other (for all relevant classes). Without this, they'll inevitably go out of sync.

  2. The patterns you've outlined look reasonable. I don't have much time to chat about abstract/upper-ontology modelling, but could potentially help a bit to make sure the patterns you want do useful work in automating classification. Do you have any draft equivalent Class axioms? It might help to build a toy ontology which has examples of terms defined with the various patterns you outline above + some relevant imports. You could then use a reasoner (Elk is probably best) to test classification.

@msinclair2
Copy link
Owner Author

That you @dosumis for the warning about same IDs. I will incorporate your suggestion.

I would like to ask you to clarify a bit what you mean by "equivalent Class axioms". Do you mean the OWL definition, and the common best practice in design, where you define a "closed world" set with both an existential and universal quantifier, and make that "equivalent to" a named class? Could you give me an example? I am new to ontology design, but I know the basics and am a quick study if pointed in the right direction.

@dosumis
Copy link

dosumis commented Oct 11, 2017

equivalent Class axioms = equivalentTo axioms ( e.g. 'arm bone' EquivalentTo bone and 'part of' some arm).

I wouldn't advise using the closure pattern. While potentially useful, it doesn't scale. Best to stick to OWL2 EL and use the ELK reasoner. In practise this means existential restrictions only.

@msinclair2
Copy link
Owner Author

I understand @dosumis. These are defined classes, with necessary and sufficient conditions for membership. Any individual for which the axioms hold true are a member of the class, and all members of the class must satisfy the axioms. On the other hand, with a mere inclusion (SubClassOf) axiom, we can only infer that if an individual is a member of the class, it must satisfy the axiom. But just because an individual satisfies the axiom, it does not mean it is a member of the class necessarily. (difference between if and only if, and a mere if/then).

There are already many equivalent class axioms that @mikebada has written in his draft of the MSO. I'll pull some examples later today.

The taxonomy of MSO is not the same as the SO, at least not as we have it so far. Part of the reason is that Mike has integrated MSO into ChEBI and follows its structure, because the MSO describes biological molecules. The SO is not structured by ChEBI. We will need to use ChEBI as well to generate the MSO.

@msinclair2
Copy link
Owner Author

Sorry for the delay @dosumis, just discussing design issues with @mikebada before responding.

@msinclair2
Copy link
Owner Author

Hi @dosumis, we (myself, @mikebada, @keilbeck) are still discussing which ontology (MSO or SO) to use as the base to infer the other from. Once we hash it out I will get back with succinct and precise design patterns. I appreciate your patience, interest, and help!

@mikebada
Copy link

I previously suggested that each SO class could be necessarily and sufficiently defined (i.e., with an OWL equivalentClass axiom) simply as being generically dependent on its corresponding MSO class, e.g., in Manchester OWL syntax:

SO:gene equivalentTo (generically_depends_on some MSO:gene)

One problem I can think of with this approach is that such a definition obviously can’t be created for an SO class that doesn’t have an analog in the MSO, e.g., SO:assembly.

Additionally, I figured that the formal definitions of the current public SO could just be transferred to the corresponding MSO classes, e.g.:

SO:intronic_regulatory_region equivalentTo (SO:transcription_regulatory_region and part_of some SO:intron)

would be removed from the SO but transferred to the corresponding MSO class:

MSO:intronic_regulatory_region equivalentTo (MSO:transcription_regulatory_region and part_of some MSO:intron)

One issue with this approach is that these useful definitions would be removed from the SO. This might be OK for those SO classes that have MSO analogs (which will be the very large majority of classes, I think), but a problem arises for those SO classes that don’t have MSO analogs, as those definitions would then be lost. The same thing would happen to the necessary axioms for such classes as well.

So, what I propose is that we keep the necessary and sufficient and the necessary axioms in the SO (with an important caveat, later) and also recreate them in the MSO, e.g., have both:

SO:intronic_regulatory_region equivalentTo (SO:transcription_regulatory_region and part_of some SO:intron)
MSO:intronic_regulatory_region equivalentTo (MSO:transcription_regulatory_region and part_of some MSO:intron)

In addition to keeping these axioms in the SO (as well as recreating them in the MSO), we could also add necessary axioms linking the SO classes to the MSO classes, e.g.,

SO:intronic_regulatory_region subclassOf (generically_depends_on some MSO:intronic_regulatory_region)

The aforementioned caveat is that it wouldn’t make sense to keep all of the axioms in the SO, as some clearly only apply to MSO classes, e.g.:

MSO:enzymatic_RNA equivalentTo (MSO:transcript and has_quality some MSO:enzymatic)

It wouldn’t make sense to have this definition in the SO as well, as the generically dependent sequence entities obviously don’t have enzymatic functionality. So, we’d have to figure out which kinds of axioms should appear in both the SO and the MSO and which should be transferred exclusively to the MSO. I think all of the “topological” axioms (e.g., part_of, adjacent_to, overlaps)--which I’m guessing constitute most of the axioms--can be represented in both.

So, I think that’s my current thinking. Lemme know if you’d like to discuss...

@msinclair2
Copy link
Owner Author

msinclair2 commented Oct 17, 2017 via email

@msinclair2
Copy link
Owner Author

msinclair2 commented Oct 22, 2017

David (@dosumis),

I'm at a bit of a loss on how to proceed without ID mapping. SO has been in continuous use for years and we can't go changing IDs that users are already familiar with. But if we want to generate SO from MSO, how do we make sure the right IDs are generated? Should we use ID mapping once to refactor the existing SO and then not use it for new terms? But then, don't we want to be able to dynamically generate the entire SO ontology from the MSO at any time, to ease the burden on curators and prevent them going out of sync due to human error?

There are a lot of places where I'm confused how to proceed, and that I'd like to ask you about, but I'm trying to keep my questions in manageable little chunks so I do not overwhelm you and turn you off....

@msinclair2
Copy link
Owner Author

msinclair2 commented Oct 23, 2017

It turns out all I needed to do was make the "generically_depends_on some MSO_class" axiom for an SO class an equivalentTo, rather than subclassOf. So long as the MSO is imported in the same space, a reasoner can automatically infer the correct hierarchy for SO based on the MSO classes they depend on.

I verified this with a toy ontology, where I manually created 4 SO classes and all I added was an equivalentTo "generically_depends_on some" MSO counterpart. With MSO direct imported, I ran the reasoner, and the 4 SO classes were classified exactly corresponding to the MSO taxonomy.

Big thanks to Mike Bada for pointing this out to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants