-
Notifications
You must be signed in to change notification settings - Fork 30
Using uberon for text mining
Authors and contributors:
- Chris Mungall (author)
Date: 2014
Document Type: ontology_usage_article
This article describes how to use uberon synonymy metadata for text mining. It may also serve as a guide of how synonyms are used in some OBO library ontologies.
Uberon largely follows (and extends) the synonym model derived from OBO format. Here a synonym assignment can be seen as a tuple consisting of:
- The subject (e.g. the OWL Class)
- The synonym literal (a string)
- A mandatory SCOPE assignment: EXACT, BROAD, NARROW, BROAD
- An optional synonym TYPE assignment. Example: ABBREVIATION, LATIN, PLURAL, ...
- Zero or more pieces of provenance (e.g. a PMID or a class ID from another ontology or an ORCID)
This document assumes the use of the above terminology (in particular the distinction between fixed scopes and extensible types).
Uberon uses the standard 4 OBO synonym scopes:
- EXACT
- BROAD
- NARROW
- RELATED
The standard obo2owl mapping is used here consult the (obo spec)[http://oboformat.org] for details; currently the following annotation properties are used:
- hasExactSynonym
- hasBroadSynonym
- hasNarrowSynonym
- hasRelatedSynonym
The uberon build pipeline ensures that no two classes share the same string as either a label or exact synonym. This helps detect common categories of errors.
(note that when the composite ontologies are built species-specific classes are labeled; e.g. ZFA heart becomes "Zebrafish heart")
Languages are indicated by a lang tag, e.g. '@fr'. Note that this is not yet currently translated in the owl correctly.
The exception is latin, for which a LATIN synonym type is used
Foreign language tags will most likely be maintained OUTSIDE the core ontology - most of these will be auto-derived from 3rd party sources (NeuroNames, dbpedia is v comprehensive). Contact us if you would like to use these.
The ontology contains a growing list of synonym types or tags, which may be useful for text mining. See the ontology for a full list.
- ABBREVIATION - Acronym or abbreviation
- LATIN - Typically the TA preferred term
- DUBIOUS - the synonym may be contested or midleading
- DEPRECATED - a historic synonym that may be used in older texts but discouraged in modern usage
- SENSU - a term typically used within a certain taxonomic scope
- ...
The synonym hierarchy can be seen in Protege
Some tips:
- plurals are generally instantiated for non-standard pluralizations. We do not materialize synonyms like "hands" for "hand", unless the synonyms are inherited from another ontology
- we follow standard OBO practice and use singular forms. For example, "meninx" for an individual meningeal layer. "meninges" is a valid plural synonym, but it is also a valid primary label for the distinct class that represents the meningeal cluster
- The "abbreviation" type is typically used for any initialism. Some prefer the term "acronym", but "acronym" is sometimes reserved for initialisms that are words
- It is natural for non-exact synonym strings to be shared between classes, as language is labile. In cases where this kind of ambiguity is know to cause confusion, we annotate the synonym with INCONSISTENT. This flag may be useful in systems that perform text processing - e.g. matches based on these may be subject to additional validation
We aim to eventually have provenance for all synonyms. Currently most of these are xrefs to species anatomy ontologies, but in future more will be PMIDs etc.
In some cases an NCBITaxon ID is used as synonym provenance. This indicates when a term is preferred or used within a particular taxonomic context.
We include a has_relational_adjective annotation property to indicate
what the adjectival form of the noun that describes the structure
is. For example, 'hippocampal' for Ammon's horn
.
In the Uberon bridging axiom ontologies the 'OBO Foundry unique label' property is used to provide a label that is intended to be unique across the whole OBO Foundry. The unique labels are generated automatically be suffixing the ontology-provided label with a qualifying term.
For example, the FMA class for 'heart' has the OBO Foundry unique label 'heart (canonical adult human)' to disambiguate it from 'heart (adult mouse)' in MA or the embryonic heart as represented in EHDAA2.
OBO-Format example:
[Term]
id: UBERON:0001997
name: olfactory epithelium
alt_id: UBERON:0004853
def: "Epithelium inside the nasal cavity that is responsible for detecting odors[WP]." [Wikipedia:Olfactory_epithelium]
xref: Wikipedia:Olfactory_epithelium
comment: Genes: V1Rs, Trpc2 present in lamprey
subset: uberon_slim
subset: vertebrate_core
synonym: "main olfactory epithelium" EXACT [NCBI:NBK55971]
synonym: "MOE" RELATED ABBREVIATION [NCBI:NBK55971]
synonym: "pseudostratified main olfactory epithelium" RELATED [http://www.ncbi.nlm.nih.gov/books/NBK55971/]
synonym: "nasal epithelium" RELATED []
synonym: "nasal sensory epithelium" RELATED []
synonym: "olfactory sensory epithelium" EXACT []
synonym: "sensory olfactory epithelium" EXACT []
synonym: "nasal cavity olfactory epithelium" EXACT [MA:0001325]
synonym: "olfactory membrane" EXACT [NIF_GrossAnatomy:birnlex_2703]
Equivalent OWL:
AnnotationAssertion(Annotation(<http://www.geneontology.org/formats/oboInOwl#hasDbXref> "MA:0001325"^^xsd:string) <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "nasal cavity olfactory epithelium"^^xsd:string)
AnnotationAssertion(Annotation(<http://www.geneontology.org/formats/oboInOwl#hasDbXref> "http://www.ncbi.nlm.nih.gov/books/NBK55971/"^^xsd:string) <http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "pseudostratified main olfactory epithelium"^^xsd:string)
AnnotationAssertion(Annotation(<http://www.geneontology.org/formats/oboInOwl#hasDbXref> "NCBI:NBK55971"^^xsd:string) Annotation(<http://www.geneontology.org/formats/oboInOwl#hasSynonymType> <http://purl.obolibrary.org/obo/uberon/core#ABBREVIATION>) <http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "MOE"^^xsd:string)
AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "olfactory sensory epithelium"^^xsd:string)
AnnotationAssertion(Annotation(<http://www.geneontology.org/formats/oboInOwl#hasDbXref> "NCBI:NBK55971"^^xsd:string) <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "main olfactory epithelium"^^xsd:string)
AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "nasal epithelium"^^xsd:string)
AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "nasal sensory epithelium"^^xsd:string)
AnnotationAssertion(Annotation(<http://www.geneontology.org/formats/oboInOwl#hasDbXref> "NIF_GrossAnatomy:birnlex_2703"^^xsd:string) <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "olfactory membrane"^^xsd:string)
AnnotationAssertion(<http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> <http://purl.obolibrary.org/obo/UBERON_0001997> "sensory olfactory epithelium"^^xsd:string)
For historic reasons, many synonym types (e.g. ABBREVIATION) are annoyingly IN ALL CAPs and have URIs that are ontology-specific than are shared (using a hash tag).
this may change, we would like to move towards a standard vocabulary for synonym types
Uberon is a multi-species anatomy ontology and knowledge base, find out more on the home page