INCITS MLO WG: Criteria for a Mid Level Ontology

This is a working document for the InterNational Committee for Information Technology Standards (INCITS) Ad Hoc Mid-Level Ontology (MLO) Working Group. The goal is to establish a set of criteria and/or heuristics for defining what a mid-level ontology is.

NB: The numerical ordering of the criteria is solely for the purpose of ease of reference in the discussion. If you would like to suggest a criterion or want to discuss an existing criterion, please use Issue #98 (https://github.com/CommonCoreOntology/CommonCoreOntologies/issues/98).

A Mid-Level Ontology (MLO) is an ontology that:

is designed to extend a top-level ontology and be extended by multiple domain and/or application ontologies.
only includes terms that are common across a variety of domains. (The challenge is to adequately define "variety of domains".)
includes terms that are used frequently, and does not contain terms that are much less common.
has a domain that can be expressed either as a single class or as a statement composed of classes and an object property within that ontology, such that this class(es) should be at the root level of the ontology with the remainder of the content consisting of only those terms and relations relevant to characterizing entities of this type.
limits the depth of coverage by limiting the number of levels of subtypes to a predefined number (e.g. no more than three levels of subtypes should be included for any given term or relation). Rationale: going down a hierarchy means the terms are increasingly specific and therefore less "mid-level".

Line of Reasoning on Scope of Mid-Level Ontologies - delivered to WG on 11/4 added to Wiki on 11/8

Note: Slides are separated by horizontal rules

Common Core Ontology GitHub Issue #135 Opened and commented on by Jim Schoening on August 26, 2021

“At the 5 Aug MLO Mtg, Barry Smith proposed we adopt rules of what terms belong in a Mid-Level Ontology (MLO) and which do not. The Project Proposal states,

“A mid-level ontology is a set of terms, definitions, and relations commonly used across multiple domains, which will enable conforming extensions for specific domains or applications.”

The following are strawman rules for how to determine if a term belongs in a Mid-level ontology (which will require governance to implement).

Rule 1) If a term is used (and needed) in two or more significant extensions, it should be moved up to the MLO. Likewise, if a term in a MLO is only needed in one significant extension, it should be moved down to that extension.

Rule 2) If Rule 1 causes an MLO is too big (or projected to get too big), terms used mostly in one domain may be moved down to that extension, and other extensions will then need to refer to that term in the other extension.”

Definitions (from Information technology – Top-level ontologies (TLO) - Part 1: Requirements (ISO/IEC DIS 21838-1))

3.1 entity/object – anything perceivable or conceivable

3.2 class – general entity

3.3 particular – individual entity

3.5 expression – word or group of words or corresponding symbols that can be used in making an assertion

3.7 term –expression(3.5) that refers to some class (3.2) or to some particular (3.3)

What does it mean that “a term is used (and needed) in two or more significant extensions”?

Example term: ‘Button’ (qua item of apparel)

It is taken as granted that this term does not belong in a mid-level ontology.

Nevertheless, entities classified by the term are part of the subject matter of various domains including apparel, archeology, history, law, manufacturing, design, arts, and journalism.

Use by Reference The entities classified by the term are used by some domains only by reference, that is, the entities are only described or depicted in some manner. Examples of such referential use of “button” are in the domains of history, design, arts, and journalism.

Incidental Use In other domains, the entities classified by the term are only incidental participants in the subject matter of the domain, meaning that an entity of another type could participate in its place without changing the significance of the subject matter (e.g., there is nothing essential about a button being part of the evidence in a criminal case, that role could be played as well by an earring, belt loop, or clothing label). Examples of such incidental use of “button” are in the domains of archeology, law, and manufacturing. This leaves the domain of apparel as the sole domain using the term button.

Rule 1) (original) If a term is used (and needed) in two or more significant extensions, it should be moved up to the MLO. Likewise, if a term in a MLO is only needed in one significant extension, it should be moved down to that extension.

Rule 1 (amendment 1) A term is included in a mid-level ontology if and only if it is non-referentially and non-incidentally used in two or more significant domains.

“Button” has a non-referential and non-incidental use in two domains

Assuming it as acknowledged that “button” is part of the apparel domain, we establish that it is part of the tailoring domain by demonstrating that the use of the term in the tailoring domain is

Non-referential: tailors fasten and remove buttons from garments

And

Non-incidental (i.e., no entities of a different type could fulfill the role buttons play in the subject matter): Buttons serve both functional and decorative role on garments and while snaps and Velcro can fulfill the same functional role, they cannot serve the same decorative role.

Thus, acts of tailoring that involve buttons do so in a way that is not incidental and by the rule even as amended, “button” belongs in a mid-level ontology, contrary to a strong intuition otherwise.

What does it mean that “A term is included in a mid-level ontology if and only if it is non-referentially and non-incidentally used in two or more significant domains.

We start from the assumption that “significance” will have different referents in different contexts. What is considered significant will depend on one’s interests and so a general definition of the word eludes us, at least at present.

Instead of a general definition of ‘significant’ we propose that it is sufficient for practical purposes to delimit the content of mid-level ontologies through a stipulative definition of ‘significant’. Consequences of this are: Different mid-level ontologies may have different content depending on their chosen definition Enables there to be mid-level ontologies at different levels of generality in an ontology ecosystem (e.g., mid-level ontology, mid-level domain ontology)

Rule 1 (amendment 1) A term is included in a mid-level ontology if and only if it is non-referentially and non-incidentally used in two or more significant domains.

Rule 1 (amendment 2) A term is included in a mid-level ontology if and only if it is non-referentially and non-incidentally used to classify entities in two or more significant domains, where a domain’s significance is measured according to some clearly defined metric/feature.

Example of a metric for significance:

A mid-level ontology might use a modified Google’s PageRank algorithm to determine the significance of a domain. Links are played by the number of times a domain appears in information classifications and importance of link is played by the level within the information classification in which it appears.

We have collected eight such classifications:

Library of Congress,
DBpedia,
schema.org,
NCES Classification,
Facebook Interests,
Twitter Topics,
CIDOC CRM, and
DODAF 2.02

Searching over these classifications reveals that while apparel does appear (in Facebook Interests as 2nd Tier terms under Shopping & Fashion, in Twitter Topics as 2nd Tier term under Fitness), tailoring does not. It is expected that application of a modified PageRank algorithm would result in neither apparel nor tailoring achieving a metric that crosses the threshold of being significant.

Comment on the significance of domains (Mark Jensen)

Tailoring is subordinate to apparel, thus of less significance when determining whether it’s use of ‘button’ counts as a second significant domain to warrant the term ‘button’ appearing in a MLO. An ontology that depends on another ontology necessarily, i.e., logically it depends on the other ontology as an extension of it and therefore imports it, does not qualify as significant regardless of ranking metrics.

This point is not meant to provide sufficient evidence in general for the significance of an entire ontology or domain, just on a term-by-term basis. This example presumes both ontologies are significant. When deciding if a term is used by more than one significant ontology, if one of those ontologies imports the other, then its significance is ignored, but just for this one case. E.g., the term ‘ethernet’ could be used directly by ontologies for the sensor and cyber domains. Both domains are deemed significant by whatever metric is being used. However, because the ontology for sensors imports the one for cyber, the term ‘ethernet’ would need to be directly used by at least one more significant ontology for it to qualify as MLO term.

This may be more of a practical or architectural criteria, but, nonetheless relevant to how ontologies are built as extensions of one-another. There is more work to do in refining this criteria. It is scoped not to the significance of of the domain itself, but rather how logical dependence and extension affect using significance metrics.

Comment on the significance of domains (Alex Cox) Define "significant" extension ontology:
Two immediate ways to determine the significance of an extension ontology is to look at (1) how big it is and (2) how important it is. Size is a simple metric to calculate and compare and certainly is not without merit; however, sheer volume of terms does not necessarily translate into significance or quality.

Talking about "importance" doesn't in itself clarify the matter any, but I find it slightly more tangible. The following are some candidate criteria for ranking the importance of an ontology: The relevance/prevalence of the ontology or its content. Possible metrics include how many ontologies import the ontology, how many datasets and/or applications are focused on its content area, and the centrality of the ontology to other ontologies/content areas. For example, an upper-level domain ontology will tend to be more important than a lower-level domain ontology. The popularity of the ontology. Possible metrics include how many users the ontology has, how many datasets leverage the ontology, how active its user community is (e.g. writing papers, doing research, or posting comments to issue trackers about it), and how highly ranked the ontology community evaluates it. The utility of the ontology. An ontology about an obscure domain (e.g. US Stamps Issued in Iowa 1950-1980) probably has less utility than a more mainstream area of research that is part of daily life (e.g. computers). Similarly, an ontology that focuses on theoretical concepts is probably less useful than one that focuses on content for which data is widely available but not adequately handled.

Proposal: An MLO should be "thick" enough to provide value in the absence of extension ontologies. This means that, in most cases, an MLO must provide more than just a single level of terms extending from the upper-level ontology. Put another way, the upper-level ontology should provide the foundation for every other possible term and the MLO should provide the content to discuss the vast majority of entities at a high level of generality.

Bridge Terms (not used but still needed) If a term is used in a domain one would expect to encounter it in texts that describe and explain the behavior of entities in the domain as for example one would expect an introductory biology text to include the term “cell”. Compiling terms from various domains would result in placing “cell”, “molecule”, and “mineral” into a mid-level ontology, among many others. If the top-level ontology of this mid-level ontology is Basic Formal Ontology, then the three terms would be sub-classes of the BFO Class “Object”.

Adhering to a commonly accepted best practice that a set of sibling sub-classes should all be defined using variants within a single category of differentia (e.g., differentiated by: color, function, shape, dimension) would require that the three presumptive sub-classes of “Object”: “Cell”, “Molecule” and “Mineral” be made sub-classes of classes that differentiate objects using some single category of differentia. One might venture classes such as: “Biological Object”, “Chemical Object” and “Geological Object”. While these terms, or some more well-considered set of terms, satisfy the best practice, they are likely not found in texts describing the domains. A name for this kind of term is “Bridge Term” and their use is permitted in mid-level ontologies.

Rule 1 (amendment 2) A term is included in a mid-level ontology if and only if it is non-referentially and non-incidentally used to classify entities in two or more significant domains, where a domain’s significance is measured according to some clearly defined metric/feature.

Rule 1 (amendment 3) A term is included in a mid-level ontology if and only if it is non-referentially and non-incidentally used to classify entities in two or more significant domains, where significance of domains is measured according to a clearly defined metric. A term is also included in a mid-level ontology if it is a bridge term that is a superclass of a term in a mid-level ontology and a subclass of another bridge class or a subclass of a class of the top-level ontology.

Elaboration/Revision of Definition of “Bridge Term” and introduction of “Grouping Term” (Alex Cox)

Define 'bridge term': Every term in an MLO should either be extended by MLO terms or by domain level terms -- that is, there should be no MLO leaf terms without (at least in principle) subclasses in one or more domain ontologies. Terms that are defined too narrowly to have subclasses belong in a domain or application ontology.

Just as no MLO term should sit outside the upper-level ontology hierarchy, no domain-level term should be a direct subtype of an upper-level term. That is, there should always be a MLO term to "bridge" domain-level terms to upper-level terms. If an adequate bridging term does not exist, the MLO is not comprehensive. Furthermore, either the domain-level term in question belongs in the MLO or a bridging term should be added to the MLO. For example, 'Button' should not be a direct subtype of bfo:MaterialEntity.

Define 'grouping term': An MLO should include the top-level terms from most domain-level ontologies. This is especially true for upper-level domain ontologies. Exceptions can occur when the top-level terms extend from a higher-level domain ontology, such as is often the case for application ontologies. In this way, a MLO provides the terminology to both bridge to and group domain-level terms. This grouping feature is an important function of an MLO. In doing so, it provides greater semantic depth, enables more fine-grained queries, and makes ontologies easier for users to view and understand. For example, instead of adding Button, Toggle, Stud, Snap Fastener, Popper, Eyelet, Buckle, Zipper, Velcro/Hook & Loop, Frogging, Hook & Eye, Magnet, Grommet, Brooch, Safety Pin, and Set of Fabric Ties/Laces (see: https://www.thecreativecurator.com/clothes-fastenings/ ) as a flat list of terms under cco:Artifact alongside every other type of artifact, a grouping term such as 'Clothing Fastener' should be added to supertype this set of 16 terms (ignoring for the moment that this term almost certainly does not belong in an MLO).

Grouping terms often make it easier for developers to apply the principle of single differentia when creating subclasses. For example, 'Clothing Fastener' can be defined as "An Artifact that is designed to bear the Function to fasten Articles of Clothing." This complies with subtypes of Artifact being differentiated based on their Artifact Functions while enabling subtypes of 'Clothing Fastener' to be differentiated based on how they realize the Clothes Fastening Artifact Function. Though care is necessary when adding grouping terms in order to avoid the accidental creation of multiple inheritance.

What does it mean for a mid-level ontology to extend from a top-level ontology?

A mid-level ontology extends from a top-level ontology if and only if

every class in a mid-level ontology is a sub-class of
a class in a mid-level ontology, or
a class in a top-level ontology.

Another way to say this is that the sum of the individuals in the extensions of the classes in a mid-level ontology form a subset of the sum of the individuals in the extensions of the classes in the top-level ontology from which it extends. Or even more simply by saying that a mid-level ontology extends from a top-level ontology if and only if there is no individual that is a member of some class in the mid-level ontology that is not a member of any class from the top-level ontology.

It follows from this definition of extends that a mid-level ontology can contain relations that are not sub-relations of another other relation. Relations don’t introduce new individuals but rather combine individuals from the extensions of any classes into ordered n-tuples

End of 11/8/21 additions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INCITS MLO WG: Criteria for a Mid Level Ontology

Line of Reasoning on Scope of Mid-Level Ontologies - delivered to WG on 11/4 added to Wiki on 11/8

Clone this wiki locally