Skip to content

Using uberon for text mining

Chris Mungall edited this page Mar 27, 2014 · 14 revisions

Synonyms in uberon

Authors and contributors:

  • Chris Mungall (author)

Date: 2014

Document Type: ontology_usage_article

Abstract

This article describes how to use uberon synonymy metadata for text mining. It may also serve as a guide of how synonyms are used in some OBO library ontologies.

Background: OBO synonym model

Uberon largely follows (and extends) the synonym model derived from OBO format. Here a synonym assignment can be seen as a tuple consisting of:

  • The subject (e.g. the OWL Class)
  • The synonym literal (a string)
  • A mandatory SCOPE assignment: EXACT, BROAD, NARROW, BROAD
  • An optional synonym TYPE assignment. Example: ABBREVIATION, LATIN, PLURAL, ...
  • Zero or more pieces of provenance (e.g. a PMID or a class ID from another ontology or an ORCID)

This document assumes the use of the above terminology (in particular the distinction between fixed scopes and extensible types).

Synonym scopes

Uberon uses the standard 4 OBO synonym scopes:

  • EXACT
  • BROAD
  • NARROW
  • RELATED

The standard obo2owl mapping is used here consult the (obo spec)[http://oboformat.org] for details; currently the following annotation properties are used:

  • hasExactSynonym
  • hasBroadSynonym
  • hasNarrowSynonym
  • hasRelatedSynonym

Label uniqueness

The uberon build pipeline ensures that no two classes share the same string as either a label or exact synonym. This helps detect common categories of errors.

(note that when the composite ontologies are built species-specific classes are labeled; e.g. ZFA heart becomes "Zebrafish heart")

Languages

Languages are indicated by a lang tag, e.g. '@fr'. Note that this is not yet currently translated in the owl correctly.

The exception is latin, for which a LATIN synonym type is used

Foreign language tags will most likely be maintained OUTSIDE the core ontology - most of these will be auto-derived from 3rd party sources (NeuroNames, dbpedia is v comprehensive). Contact us if you would like to use these.

Synonym types

The ontology contains a growing list of synonym types or tags, which may be useful for text mining. See the ontology for a full list.

  • ABBREVIATION - Acronym or abbreviation
  • LATIN - Typically the TA preferred term
  • DUBIOUS - the synonym may be contested or midleading
  • DEPRECATED - a historic synonym that may be used in older texts but discouraged in modern usage
  • SENSU - a term typically used within a certain taxonomic scope
  • ...

The synonym hierarchy can be seen in Protege

image

Some tips:

  • plurals are generally instantiated for non-standard pluralizations. We do not materialize synonyms like "hands" for "hand", unless the synonyms are inherited from another ontology
  • we follow standard OBO practice and use singular forms. For example, "meninx" for an individual meningeal layer. "meninges" is a valid plural synonym, but it is also a valid primary label for the distinct class that represents the meningeal cluster
  • The "abbreviation" type is typically used for any initialism. Some prefer the term "acronym", but "acronym" is sometimes reserved for initialisms that are words
  • It is natural for non-exact synonym strings to be shared between classes, as language is labile. In cases where this kind of ambiguity is know to cause confusion, we annotate the synonym with INCONSISTENT. This flag may be useful in systems that perform text processing - e.g. matches based on these may be subject to additional validation

Synonym provenance

We aim to eventually have provenance for all synonyms. Currently most of these are xrefs to species anatomy ontologies, but in future more will be PMIDs etc.

In some cases an NCBITaxon ID is used as synonym provenance. This indicates when a term is preferred or used within a particular taxonomic context.

Relational adjectives

We include a has_relational_adjective annotation property to indicate what the adjectival form of the noun that describes the structure is. For example, 'hippocampal' for Ammon's horn.

OBO Foundry Unique Label

In the Uberon bridging axiom ontologies the 'OBO Foundry unique label' property is used to provide a label that is intended to be unique across the whole OBO Foundry. The unique labels are generated automatically be suffixing the ontology-provided label with a qualifying term.

For example, the FMA class for 'heart' has the OBO Foundry unique label 'heart (canonical adult human)' to disambiguate it from 'heart (adult mouse)' in MA or the embryonic heart as represented in EHDAA2.

Examples

OBO-Format example:

[Term]
id: UBERON:0001997
name: olfactory epithelium
alt_id: UBERON:0004853
def: "Epithelium inside the nasal cavity that is responsible for detecting odors[WP]." [Wikipedia:Olfactory_epithelium]
xref: Wikipedia:Olfactory_epithelium
comment: Genes: V1Rs, Trpc2 present in lamprey
subset: uberon_slim
subset: vertebrate_core
synonym: "main olfactory epithelium" EXACT [NCBI:NBK55971]
synonym: "MOE" RELATED ABBREVIATION [NCBI:NBK55971]
synonym: "pseudostratified main olfactory epithelium" RELATED [http://www.ncbi.nlm.nih.gov/books/NBK55971/]
synonym: "nasal epithelium" RELATED []
synonym: "nasal sensory epithelium" RELATED []
synonym: "olfactory sensory epithelium" EXACT []
synonym: "sensory olfactory epithelium" EXACT []
synonym: "nasal cavity olfactory epithelium" EXACT [MA:0001325]
synonym: "olfactory membrane" EXACT [NIF_GrossAnatomy:birnlex_2703]

Equivalent OWL:

AnnotationAssertion(Annotation(<http://www.geneontology.org/formats/oboInOwl#hasDbXref> "MA:0001325"^^xsd:string) <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "nasal cavity olfactory epithelium"^^xsd:string)
AnnotationAssertion(Annotation(<http://www.geneontology.org/formats/oboInOwl#hasDbXref> "http://www.ncbi.nlm.nih.gov/books/NBK55971/"^^xsd:string) <http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "pseudostratified main olfactory epithelium"^^xsd:string)
AnnotationAssertion(Annotation(<http://www.geneontology.org/formats/oboInOwl#hasDbXref> "NCBI:NBK55971"^^xsd:string) Annotation(<http://www.geneontology.org/formats/oboInOwl#hasSynonymType> <http://purl.obolibrary.org/obo/uberon/core#ABBREVIATION>) <http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "MOE"^^xsd:string)
AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "olfactory sensory epithelium"^^xsd:string)
AnnotationAssertion(Annotation(<http://www.geneontology.org/formats/oboInOwl#hasDbXref> "NCBI:NBK55971"^^xsd:string) <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "main olfactory epithelium"^^xsd:string)
AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "nasal epithelium"^^xsd:string)
AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "nasal sensory epithelium"^^xsd:string)
AnnotationAssertion(Annotation(<http://www.geneontology.org/formats/oboInOwl#hasDbXref> "NIF_GrossAnatomy:birnlex_2703"^^xsd:string) <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "olfactory membrane"^^xsd:string)
AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "sensory olfactory epithelium"^^xsd:string)

In P4:

image

URIs of synonym types

For historic reasons, many synonym types (e.g. ABBREVIATION) are annoyingly IN ALL CAPs and have URIs that are ontology-specific than are shared (using a hash tag).

this may change, we would like to move towards a standard vocabulary for synonym types

Clone this wiki locally