From a3a55dfb534106d73667aa67c792999b0e12072d Mon Sep 17 00:00:00 2001 From: cmungall Date: Fri, 22 Sep 2023 15:11:15 -0700 Subject: [PATCH] Additional documentation --- docs/glossary.rst | 198 ++++- docs/guide/learning-more.rst | 5 + docs/guide/primary-labels.rst | 2 +- notebooks/Clinical/OMOP-Example.ipynb | 1115 +++++++++++++++++++------ 4 files changed, 1042 insertions(+), 278 deletions(-) diff --git a/docs/glossary.rst b/docs/glossary.rst index 5721e9329..a8b20e820 100644 --- a/docs/glossary.rst +++ b/docs/glossary.rst @@ -9,7 +9,9 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. .. glossary:: Ontology - A flexible concept loosely encompassing any collection of :term:`Ontology Elements` and statements or relationships connecting them + A flexible concept loosely encompassing any collection of :term:`Ontology Elements` and statements or relationships connecting them. + + - See also :ref:`basics` in the Guide. Ontology Element A discrete part of an :term:`Ontology`, with a unique persistent identifier. The most important elements are :term:`Terms`, but @@ -35,14 +37,20 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. In :term:`Semantic Web` and :term:`Linked Data` technologies, identifiers are always :term:`IRIs`, although they may be shortened to :term:`CURIEs` within individual documents. + - See also :ref:`curies_and_uris` in the Guide. + CURIE A :term:`CURIE` is a compact :term:`URI`. For example, ``CL:0000001`` is the CURIE for the root :term:`Class` in the Cell Ontology (which has the :term:`Prefix` ``CL``). + - See also :ref:`curies_and_uris` in the Guide. + URI A Uniform Resource Indicator, a generalization of URL. Most people think of URLs as being solely for addresses for web pages (or APIs) but in semantic web technologies, URLs can serve as actual identifiers for entities like ontology terms. Data models like :term:`OWL` and :term:`RDF` use URIs as :term:`identifiers`. In OAK, URIs are mapped to :term:`CURIEs`. + - See also :ref:`curies_and_uris` in the Guide. + Label Usually refers to a human-readable label corresponding to the ``rdfs:label`` :term:`predicate`. Labels are typically unique per ontology. In :term:`OBO Format` and in the bio-ontology literature, labels are sometimes called :term:`Names`. @@ -50,9 +58,14 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. In the context of OAK, :term:`Label` is used to refer to the ``rdfs:label`` :term:`Predicate`, or sometimes ``skos:prefLabel``. + - See also :ref:`primary_labels` in the Guide. + Name Usually synonymous with :term:`Label`, but in the formal logic and OWL community, "Name" sometimes denotes an :term:`Identifier` + - See also :ref:`primary_labels` in the Guide. + - See also :ref:`curies_and_uris` in the Guide. + Category The term :term:`Category` is frequently ambiguous. In the context of OAK it refers to a high-level grouping :term:`Class` that may come from an upper ontology like :term:`COB` or a schema language like @@ -71,6 +84,14 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. "annotations". Associations can be seen as special cases of :term:`Edges`, but it is often convenient to treat them differently (for example, associations frequently have additional metadata and evidence, and often have nuanced semantics that different from standard ontology edges). + Despite the differences, we still use the same terminology for associations as for :term:`Edges`. + The :term:`Subject` of an association is the named entity, which the association is about; it could be + a gene, a person, a sample, a document, a disease, or any number of things. It could potentially be represented + by a node in an ontology, but it is more typically a databased entity. + The :term:`Object` is the ontology term that is used as a descriptor for the subject. + (Confusingly, in some formats, the "database object" actually refers to the *subject* of the association). + + - See also :ref:`associations` in the Guide. Text Annotation The process of annotating spans of texts within a text document with references to ontology terms, or the result of this @@ -81,19 +102,45 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. The term :term:`Mapping` is often used differently by different communities. In the context of OAK it means a pairwise association between two :term:`Ontology Elements`, where those elements are conceptually similar or close in meaning. OAK adheres closely to the :term:`SSSOM` data model. + Note that OAK treats mappings as distinct from ontology :term:`Associations` or + :term:`Edges`, due to different use cases for each of these structures. However, there are + commonalities, and we use the terms :term:`Subject`, :term:`Object`, and :term:`Predicate` in the same way + for each of these structures. + + - See also :ref:`mappings` in the Guide. SSSOM Simple Standard for Sharing Ontological Mappings. SSSOM is the primary :term:`Datamodel` in OAK for passing around :term:`Mappings`. + - See also :ref:`mappings` in the Guide. + Graph Formally a graph is a data structure consisting of :term:`Nodes` and :term:`Edges`. There are different forms of graphs, but for the purposes of OAK, an ontology graph has all :term:`Terms` as nodes, and relationships connecting terms (is-a, part-of) as edges. Note the concept of an ontology graph and an :term:`RDF` graph do not necessarily fully align - RDF graphs of OWL ontologies employ numerous blank nodes that obscure the ontology structure. See :term:`Ontology Graph Projection`. + - See also :ref:`relationships_and_graphs` in the Guide. + Edge See :term:`Relationship` + Relationship + A :term:`Relationship` is a type connection between two ontology elements. The first element is called the :term:`Subject`, + and the second one the :term:`Object`, with the type of connection being the :term:`Predicate`. + Sometimes Relationships are equated with :term:`Triples` in :term:`RDF` but this can be confusing, because some relationships + map to *multiple* triples when following the OWL RDF serialization. An example is the relationship "finger part-of hand", + which in OWL is represented using a :term:`Existential Restriction` that maps to 4 triples. + + - See also :ref:`relationships_and_graphs` in the Guide. + + Triple + The term "triple" is generally only used in the context of the :term:`RDF` data model. A triple is a + simple statement consisting of a :term:`Subject`, :term:`Predicate`, and :term:`Object`. + The concept of triple is closely related to, but not identical to, the concept of :term:`Relationship`. + + - See also :ref:`relationships_and_graphs` in the Guide. + Node A :term:`Node` (aka Vertex) is one of the two main elements that make up a :term:`Graph`. The other element is an :term:`Edge`. The nodes in a graph typically represent :term:`Classes` @@ -101,6 +148,26 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. be :term:`Instances` or :term:`Relationship Types`, or metadata elements such as :term:`Subset` definitions. + - See also :ref:`relationships_and_graphs` in the Guide. + + Subject + The subject of a :term:`Relationship` or :term:`Association` is the first element. + The subject is always a :term:`Node`. + Note that the same node can be the Subject of one edge, and the :term:`Object` of another edge. + For example, the node for "Scoliosis" in the Human Phenotype Ontology is the subject of the SubClassOf + edge whose object is "Abnormality of the vertebral column"; it may also be the object of + a gene-phenotype association edge. + + - See also :ref:`relationships_and_graphs` in the Guide. + + Object + The term "Object" is highly overloaded. In a general programming context, + it refers to an instance of a (programmatic) class. But typically in the OAK + context, it refers to the second element in a :term:`Relationship` or :term:`Association`. + It is the counterpart to :term:`Subject`. + + - See also :ref:`relationships_and_graphs` in the Guide. + Relationship Type See :term:`Predicate` @@ -112,6 +179,8 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. * :term:`IS_A` (rdfs:subClassOf) * :term:`Part Of` (BFO:0000050) + - See also :ref:`relationships_and_graphs` in the Guide. + IS_A: The :term:`is-a` relationship type. This is a builtin construct in :term:`OWL` and is not represented as an :term:`Ontology Element`. In OAK, the :term:`IS_A` relationship type is @@ -129,9 +198,41 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. paramaterized by a set of :term:`Predicates`. The concept of :term:`Ancestor` and graph traversal is closely related to the concept of :term:`Entailment` in :term:`OWL`. + - See also :ref:`relationships_and_graphs` in the Guide. + Descendant The converse of :term:`Ancestor`. + Closure + In the context of ontologies and OAK "closure" refers to the closure of a predicate, i.e. the + :term:`Ancestor` of all entities that are reachable by following the predicate or predicates. + + - See also :ref:`relationships_and_graphs` in the Guide. + + Subject Closure + The :term:`Subject Closure` of an edge is the set of all entities that are reachable by following + the :term:`Subject` of the edge or association, over a specified set of predicates + (called the :term:`Subject Closure Predicates`). + For example, in a disease + phenotype association, if the disease is "Mucopolysaccharidosis type I", then the subject closure would + include "Mucopolysaccharidosis", "Lysosomal Storage Disease", "Disease". In cases where the subject + is a database entity rather than an ontology term, the subject closure may trivially be a singleton + containing only the subject. + + - See also :ref:`relationships_and_graphs` in the Guide. + - See also :ref:`associations` in the Guide. + + Object Closure + The :term:`Object Closure` of an edge is the set of all entities that are reachable by following + the :term:`Object` of the edge or association, over a specified set of predicates + (called the :term:`Object Closure Predicates`). + For example, in a disease + to phenotype association, if the phenotype is "Abnormality of the vertebral column", then the object closure would + include "Abnormality of the vertebral column", "Abnormality of the musculoskeletal system", etc. + + - See also :ref:`relationships_and_graphs` in the Guide. + - See also :ref:`associations` in the Guide. + Ontology Graph Projection The mapping between an ontology as represented in some formalism such as :term:`OWL` ontology onto a :term:`Graph`. This is a non-trivial process, because OWL ontologies are not natively represented as graphs, instead they are @@ -142,42 +243,60 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. OAK makes use of a simple projection where OWL existential axioms are mapped to :term:`Edges`, similar to :term:`Relation Graph`. + - See also :ref:`relationships_and_graphs` in the Guide. + Relation Graph Relation Graph is both a tool and a :term:`Ontology Graph Projection`. Relation Graph is used behind the scenes in both :term:`Ubergraph` and in :term:`Semantic SQL`. For the tool, see `INCATools/relation-graph `_. + - See also :ref:`relationships_and_graphs` in the Guide. + Ontology Format A syntax for serializing an :term:`Ontology` as text. Examples include :term:`OWL Functional Syntax`, various :term:`RDF` formats such as :term:`Turtle`, or :term:`OBO Format`. In OAK we take a broad view of the term "Ontology", and also include things such as RDF serializations of :term:`SKOS`. - See also: - - :term:`guide_ontology_languages` + + - See also :ref:`basics` in the Guide. + - See also `OWL Formats `_ in the OBook. OWL An ontology language that uses constructs from :term:`Description Logic`. OWL is not itself an ontology format, it can be serialized through different :term:`Ontology Formats` such as :term:`Functional Syntax`, and it can be mapped to :term:`RDF` and serialized via an RDF format. + - See also `OWL Formats `_ in the OBook. + RDF A :term:`Datamodel` consisting of simple :term:`Subject` :term:`Predicate` :term:`Object` :term:`Triples` organized into an RDF :term:`Graph` + - See also `OWL Formats `_ in the OBook. + FunOWL FunOWL is a Python :term:`Ontology Library` that provides a simple API for working with OWL ontologies conceptualized using the native OWL :term:`OWL Functional Syntax` representation. + - See ``_ + Functional Syntax A syntax / :term:`Ontology Format` that directly expresses the :term:`OWL` data model. + - See also `OWL Formats `_ in the OBook. + OBO Format An :term:`Ontology Format` designed for easy viewing, direct editing, and readable diffs. It is popular in bioinformatics, but not widely used or known outside the genomics sphere. OBO is mapped to OWL, but only expresses a subset, and provides some OWL - abstractions in a more easy to understand fashion. See: ``_ + abstractions in a more easy to understand fashion. + + - See: ``_ + - See also `OWL Formats `_ in the OBook. Pronto An :term:`Ontology Library` for parsing :term:`OBO Format` with some support for :term:`OWL` files. OAK provides a wrapper around Pronto via the :ref:`pronto_implementation`. + - See: ``_ + OBO Graphs A JSON-based serialization :term:`Ontology Format` and also a :term:`Datamodel` for representing :term:`Ontology Graphs`. OBO Graphs are designed to be an abstraction that is more suited to data science tasks than @@ -186,17 +305,23 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. Input Selector A syntax that provides a shorthand for selecting an :term:`Adapter` to communicate with an ontology. These may be command line based or for a remote endpoint. The syntax is typically ``:`` - but if a path is specified, a default adapter will be used. See :ref:`selectors`. + but if a path is specified, a default adapter will be used. + + - See :ref:`selectors`. OWL Annotation In the context of :term:`OWL`, the term :term:`Annotation` means a piece of metadata that does not have a strict logical interpretation. Annotations can be on entities, for example, :term:`Label` annotations, or annotations can be on :term:`Axioms`. + - See `Section 8.1 in the OWL primer `_ + Named Individual An :term:`Ontology Element` that represents an instance of a class. For example, the instance "John" or "John's heart". Note that instances are not commonly directly represented in bio-ontologies, but may be more common in other domains. + - See `Section 4.1 in the OWL primer `_ + Property An :term:`Ontology Element` that represents an attribute or a characteristic of an element. In :term:`OWL`, properties are divided into disjoint categories: @@ -209,6 +334,8 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. Object Properties are also used in :term:`Class` :term:`Axioms`, to express generalizations about how instances of those classes are necessarily related. + - See `Section 4.4 in the OWL primer `_ + AnnotationProperty In OWL, an :term:`AnnotationProperty` is a :term:`Property` that connects an :term:`Ontology Element` to another element for the purposes of assigning metadata. Annotation Properties are "logically @@ -220,19 +347,11 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. a :term:`Literal`. Datatype properties are not widely used in most bio-ontologies, and currently OAK has limited support for working with them. - Triple - A simple :term:`Relationship` that is a tuple of :term:`Subject`, :term:`Predicate`, and :term:`Object`. - - Relationship - A :term:`Relationship` is a type connection between two ontology elements. The first element is called the :term:`Subject`, - and the second one the :term:`Object`, with the type of connection being the :term:`Predicate`. - Sometimes Relationships are equated with :term:`Triples` in :term:`RDF` but this can be confusing, because some relationships - map to *multiple* triples when following the OWL RDF serialization. An example is the relationship "finger part-of hand", - which in OWL is represented using a :term:`Existential Restriction` that maps to 4 triples. - Logical Definition A :term:`Logical Definition` is a particular kind of :term:`Axiom` that is used to provide a - definition of a term that is *computable*. See :ref:`logical_definitions`. + definition of a term that is *computable*. + + - See :ref:`logical_definitions`. Subset An :term:`Ontology Element` that represents a named collection of elements, typically grouped for some purpose. @@ -242,6 +361,9 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. An ontology tool that will perform inference over an ontology to yield new *axioms* (e.g. new :term:`Edges`) or to determine if an ontology is logically :term:`Coherent`. + - See `Reasoning `_ in the OBook. + - See also :ref:`relationships_and_graphs` in the Guide. + Reasoning See :term:`Reasoner` and :term:`Entailed` @@ -249,21 +371,36 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. An :term:`Ontology Repository` that is a comprehensive collection of multiple biologically relevant ontologies. Bioportal exposes an :term:`API` endpoint, that is utilized by the OAK :ref:`bioportal_implementation`. + - See ``_ + - See :ref:`bioportal_implementation`. + + OntoPortal + A framework for :term:`Ontology Repositories` that is used by :term:`Bioportal`, + as well as AgroPortal, EcoPortal, etc. + - See :ref:`bioportal_implementation`. + Asserted An :term:`Axiom` or :term:`Edge` that is directly asserted in an ontology, as opposed to being :term:`Entailed`. Note that asserted edges or axioms usually correspond to :term:`Direct` (one-hop) edges, but this isn't always the case. + - See `Reasoning `_ in the OBook. + Entailed An :term:`Axiom` or :term:`Edge` that is is inferred by a :term:`Reasoner`. Note that all asserted edges or axioms are also entailed. Note also that sometimes entailed axioms can include trivial :term:`Tautologies`. + - See `Reasoning `_ in the OBook. + - See also :ref:`relationships_and_graphs` in the Guide. + Graph Traversal A strategy for walking :term:`graphs`, such as from a start node to all ancestors or descendants. In some cases, graph traversal can be used in place of :term:`Reasoning`. See the section on :ref:`relationships_and_graphs` in the OAK guide. + - See also :ref:`relationships_and_graphs` in the Guide. + Reflexive A :term:`Edge` or :term:`Axiom` that connects an :term:`Ontology Element` to itself. These are trivially true (:term:`Tautology`), but in general these are included by @@ -278,6 +415,9 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. many from :term:`OBO`. OLS exposes an :term:`API` endpoint, that is utilized by the OAK OLS :term:`Implementation` + - See ``_ + - See :ref:`ols_implementation`. + Triplestore A :term:`Graph` database that stores :term:`Triples` in a :term:`RDF` :term:`Graph`. Triplestores are used to store :term:`Ontology` data, and to provide :term:`SPARQL` querying over the data. @@ -298,24 +438,38 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. Accessible via :ref:`ubergraph_implementation`. Ubergraph includes inferred :term:`Relation Graph` edges as triples. + - See ``_ + - See :ref:`ubergraph_implementation`. + Ontobee A :term:`Triplestore` and a :term:`Ontology Repository` that allows for :term:`SPARQL` querying of integrated :term:`OBO` ontologies. Accessible via :ref:`ontobee_implementation`. + - See ``_ + - See :ref:`ontobee_implementation`. + Semantic SQL Semantic SQL is a proposed standardized schema for representing any RDF/OWL ontology, plus a set of tools for building - a database conforming to this schema from RDF/OWL files. See `Semantic-SQL `_ + a database conforming to this schema from RDF/OWL files. + + - See `Semantic-SQL `_ Diff A representation of an individual difference between two :term:`Ontologies`. + - See :ref:`differ_interface`. + Patch A representation of a set of :term:`Diffs` that are intended to be applied. + - See :ref:`patcher_interface`. + KGCL Knowledge Graph Change Language (KGCL) is a :term:`Datamodel` for communicating desired changes (aka :term:`Patch`) to an ontology. It can also be used to communicate :term:`Diffs` between two ontologies. See `KGCL docs `_. + - See :ref:`patcher_interface`. + Semantic Similarity A means of measuring similarity between either pairs of ontology concepts, or between entities annotated using ontology concepts. There is a wide variety of different methods for calculating semantic similarity, for example :term:`Jaccard Similarity` @@ -336,6 +490,8 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. A programmatic abstraction that allows us to focus on *what* something should do rather than *how* it is done. Contrast with :ref:`Interface`. The *how* is managed by an :term:`Implementation`. + - See :ref:`interfaces`. + Implementation Also known as :term:`Adapter`. Typically the details of implementation should not be exposed, and developers of applications that use OAK should always :term:`Code to the Interface`. @@ -344,6 +500,8 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. triplestore like :term:`Ubergraph`, a :term:`Semantic SQL` adapter, or a local :term:`OBO Graphs` file. See the list of :ref:`all implementations` + - See :ref:`implementations`. + Datamodel Aka schema. OAK follows a pluralistic worldview, and includes many different datamodels for different purposes. Examples include: @@ -352,8 +510,10 @@ For a deeper dive into some of these concepts, see the :ref:`guide`. - A data model for representing :term:`Text Annotation` results - The :term:`SSSOM` data model, for representing :term:`Mappings` - A data model for representing :term:`Semantic Similarity` results - See the list of all :ref:`datamodels`. + + - See :ref:`datamodels`. OntoGPT A framework built on OAK that combines ontologies and Large Language Models. - See ``_ + + - See ``_ diff --git a/docs/guide/learning-more.rst b/docs/guide/learning-more.rst index 19b7bd8d1..2762e8a3f 100644 --- a/docs/guide/learning-more.rst +++ b/docs/guide/learning-more.rst @@ -9,3 +9,8 @@ For now, you can explore some more advanced OAK concepts through the use of diff :term:`Interfaces`. See the :ref:`interfaces` section for more information. + +Other resources: + +- `The OBO Academy Book (OBook) `_ +- `The OWL Primer `_ diff --git a/docs/guide/primary-labels.rst b/docs/guide/primary-labels.rst index e9aded4a4..131ee7893 100644 --- a/docs/guide/primary-labels.rst +++ b/docs/guide/primary-labels.rst @@ -1,4 +1,4 @@ -.. _primary-labels: +.. _primary_labels: Primary Labels ============== diff --git a/notebooks/Clinical/OMOP-Example.ipynb b/notebooks/Clinical/OMOP-Example.ipynb index 5cfe1a741..509ad3990 100644 --- a/notebooks/Clinical/OMOP-Example.ipynb +++ b/notebooks/Clinical/OMOP-Example.ipynb @@ -4,42 +4,35 @@ "cell_type": "markdown", "id": "1940578a", "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "source": [ - "# OMOP Examples\n", + "# OMOP/N3C Examples\n", "\n", - "See https://github.com/jhu-bids/TermHub/issues/516\n" + "See [https://github.com/jhu-bids/TermHub/issues/516](https://github.com/jhu-bids/TermHub/issues/516)\n" ] }, { "cell_type": "markdown", "id": "3b8c6747", "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "source": [ - "## Basic term lookup" + "## Basic term lookup\n", + "\n", + "First we will create an adapter object for a previously created sqlite database containing the N3C version of OMOP." ] }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 88, "id": "4de67698", "metadata": { - "ExecuteTime": { - "end_time": "2023-08-19T02:03:00.757982Z", - "start_time": "2023-08-19T02:02:58.831821Z" - }, "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T14:34:37.849023Z", + "start_time": "2023-09-15T14:34:37.843989Z" } }, "outputs": [], @@ -48,18 +41,25 @@ "adapter = get_adapter('input/n3c.db')" ] }, + { + "cell_type": "markdown", + "source": [ + "Next we will do a lookup -- note in OAK all IDs are compact URIs (prefixed identifiers)" + ], + "metadata": { + "collapsed": false + }, + "id": "eed28476b310828b" + }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 89, "id": "c58acbfa", "metadata": { - "ExecuteTime": { - "end_time": "2023-08-19T02:03:00.777882Z", - "start_time": "2023-08-19T02:03:00.760498Z" - }, "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T14:34:48.128627Z", + "start_time": "2023-09-15T14:34:48.120532Z" } }, "outputs": [ @@ -78,39 +78,72 @@ }, { "cell_type": "markdown", - "id": "5752d298", + "source": [ + "If you have a list of ids you can use the `labels` method:" + ], + "metadata": { + "collapsed": false + }, + "id": "7f67d01d579dc1b4" + }, + { + "cell_type": "code", + "execution_count": 90, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "omop:4195673 Angioplasty of posterior tibial artery\n", + "omop:45933598 Postsurgical Percutaneous Transluminal Coronary Angioplasty Status\n" + ] + } + ], + "source": [ + "for id, label in adapter.labels([TERM_ID, \"omop:45933598\"]):\n", + " print(id, label)" + ], "metadata": { "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T14:34:57.212788Z", + "start_time": "2023-09-15T14:34:57.205303Z" } }, + "id": "16d59cacac35ddbe" + }, + { + "cell_type": "markdown", + "id": "5752d298", + "metadata": { + "collapsed": false + }, "source": [ - "## Basic Search" + "## Basic Search\n", + "\n", + "OAK has a number of different ways to search ontologies or to do boolean queries combining\n", + "lexical search and graph constraints.\n", + "\n", + "We will start with a simple lookup-by-label, which considered the most basic search:" ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 91, "id": "77c6eb24", "metadata": { - "ExecuteTime": { - "end_time": "2023-08-19T02:03:05.066505Z", - "start_time": "2023-08-19T02:03:00.780740Z" - }, "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T14:35:55.263063Z", + "start_time": "2023-09-15T14:35:50.926077Z" } }, "outputs": [ { "data": { - "text/plain": [ - "['omop:1018433']" - ] + "text/plain": "['omop:1018433']" }, - "execution_count": 3, + "execution_count": 91, "metadata": {}, "output_type": "execute_result" } @@ -119,18 +152,25 @@ "list(adapter.basic_search(\"Angioplasty\"))" ] }, + { + "cell_type": "markdown", + "source": [ + "Next we will do a partial string match search -- for this we will need a SearchConfiguration object:" + ], + "metadata": { + "collapsed": false + }, + "id": "cb98e0bcc0a6956d" + }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 93, "id": "0e1054d0", "metadata": { - "ExecuteTime": { - "end_time": "2023-08-19T02:03:09.387547Z", - "start_time": "2023-08-19T02:03:05.069847Z" - }, "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T14:37:31.532018Z", + "start_time": "2023-09-15T14:37:26.980092Z" } }, "outputs": [ @@ -154,29 +194,71 @@ }, { "cell_type": "markdown", - "id": "76f14f50", + "source": [ + "Note we truncated the results to 5 for brevity.\n", + "\n", + "Lexical search is currently quite slow for the sqlite backend. In future it will be possible\n", + "to use a hybrid approach with lucene-backed search using Solr.\n", + "\n", + "## Command Line Simple Search\n", + "\n", + "We can do the same thing on the command line:" + ], + "metadata": { + "collapsed": false + }, + "id": "308da9d982fd580f" + }, + { + "cell_type": "code", + "execution_count": 94, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "omop:45933598 ! Postsurgical Percutaneous Transluminal Coronary Angioplasty Status\r\n", + "omop:1389714 ! Percutaneous transluminal angioplasty of native or recurrent coarctation of the aorta\r\n", + "omop:2101787 ! Anesthesia for angioplasty (Deprecated)\r\n", + "omop:2107779 ! Transluminal balloon angioplasty, open; renal or other visceral artery\r\n", + "omop:2107780 ! Transluminal balloon angioplasty, open; aortic\r\n" + ] + } + ], + "source": [ + "!runoak -i $db/omop.db info l~Angioplasty | head -5" + ], "metadata": { "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T14:37:46.346753Z", + "start_time": "2023-09-15T14:37:31.528786Z" } }, + "id": "a1d351f131246dae" + }, + { + "cell_type": "markdown", + "id": "76f14f50", + "metadata": { + "collapsed": false + }, "source": [ - "## Graph Queries" + "## Graph Queries\n", + "\n", + "Ontologies and ontology-like structures can be projected onto graphs, and OAK provides a number of\n", + "graph-oriented queries. For now we only consider IS_A graphs." ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 95, "id": "0de6da3d", "metadata": { - "ExecuteTime": { - "end_time": "2023-08-19T02:03:09.411940Z", - "start_time": "2023-08-19T02:03:09.391590Z" - }, "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T14:38:16.462998Z", + "start_time": "2023-09-15T14:38:16.438933Z" } }, "outputs": [ @@ -222,16 +304,82 @@ }, { "cell_type": "code", - "execution_count": 6, - "id": "23dceb5e", + "execution_count": 96, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "rdfs:subClassOf omop:4000756 Leg repair\n", + "rdfs:subClassOf omop:4050134 Angioplasty of crural artery\n", + "rdfs:subClassOf omop:4091623 Surgical repair of lower extremity\n", + "rdfs:subClassOf omop:4184453 Operative procedure on lower leg\n", + "rdfs:subClassOf omop:4012185 Cardiovascular surgical procedure\n", + "rdfs:subClassOf omop:4181322 Surgical repair procedure by body site\n", + "rdfs:subClassOf omop:4301351 Surgical procedure\n", + "rdfs:subClassOf omop:4181193 Limb operation\n", + "rdfs:subClassOf omop:4302652 Angioplasty of blood vessel\n", + "rdfs:subClassOf omop:4311041 Repair of artery\n", + "rdfs:subClassOf omop:4190070 Angioplasty of artery of lower extremity\n", + "rdfs:subClassOf omop:4002031 Cardiovascular system repair\n", + "rdfs:subClassOf omop:4148948 Vascular surgery procedure\n", + "rdfs:subClassOf omop:4311041 Repair of artery\n", + "rdfs:subClassOf omop:4331725 Operative procedure on artery of extremity\n", + "rdfs:subClassOf omop:4030028 Surgical procedure on lower extremity\n", + "rdfs:subClassOf omop:4181322 Surgical repair procedure by body site\n", + "rdfs:subClassOf omop:4012185 Cardiovascular surgical procedure\n", + "rdfs:subClassOf omop:4159949 Surgical procedure on soft tissue\n", + "rdfs:subClassOf omop:4301351 Surgical procedure\n", + "rdfs:subClassOf omop:4030028 Surgical procedure on lower extremity\n", + "rdfs:subClassOf omop:4148948 Vascular surgery procedure\n", + "rdfs:subClassOf omop:4185115 Surgical repair\n", + "rdfs:subClassOf omop:4301351 Surgical procedure\n", + "rdfs:subClassOf omop:4185115 Surgical repair\n", + "rdfs:subClassOf omop:4030028 Surgical procedure on lower extremity\n", + "rdfs:subClassOf omop:4301351 Surgical procedure\n", + "rdfs:subClassOf omop:4050128 Angioplasty of artery\n", + "rdfs:subClassOf omop:4062347 Surgical repair of artery of extremity\n", + "rdfs:subClassOf omop:4091623 Surgical repair of lower extremity\n", + "rdfs:subClassOf omop:4160912 Vascular surgical procedure on lower limb\n", + "rdfs:subClassOf omop:4177089 Surgical repair procedure by device\n", + "rdfs:subClassOf omop:46271049 Angioplasty of peripheral blood vessel\n", + "rdfs:subClassOf omop:4000756 Leg repair\n", + "rdfs:subClassOf omop:4050134 Angioplasty of crural artery\n", + "rdfs:subClassOf omop:4054559 Repair of blood vessel\n", + "rdfs:subClassOf omop:4324523 Dilation procedure\n", + "rdfs:subClassOf omop:4054559 Repair of blood vessel\n", + "rdfs:subClassOf omop:4301351 Surgical procedure\n", + "rdfs:subClassOf omop:4148948 Vascular surgery procedure\n", + "rdfs:subClassOf omop:4181193 Limb operation\n", + "rdfs:subClassOf omop:4002031 Cardiovascular system repair\n", + "rdfs:subClassOf omop:4324523 Dilation procedure\n" + ] + } + ], + "source": [ + "ancs = list(adapter.ancestors([TERM_ID], predicates=[IS_A]))\n", + "for a in ancs:\n", + " for _, p, o in adapter.relationships([a], predicates=[IS_A]):\n", + " print(p, o, adapter.label(o))" + ], "metadata": { + "collapsed": false, "ExecuteTime": { - "end_time": "2023-08-19T02:03:12.290810Z", - "start_time": "2023-08-19T02:03:09.411017Z" - }, + "end_time": "2023-09-15T14:39:42.226228Z", + "start_time": "2023-09-15T14:39:42.054005Z" + } + }, + "id": "a9611ee1302a18ce" + }, + { + "cell_type": "code", + "execution_count": 97, + "id": "23dceb5e", + "metadata": { "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T14:40:11.622299Z", + "start_time": "2023-09-15T14:40:08.646936Z" } }, "outputs": [ @@ -257,27 +405,21 @@ "cell_type": "markdown", "id": "1230ac38", "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "source": [ - "## Semantic Similarity" + "## Simple Pairwise Semantic Similarity" ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 98, "id": "cda772fc", "metadata": { - "ExecuteTime": { - "end_time": "2023-08-19T02:03:17.051975Z", - "start_time": "2023-08-19T02:03:12.293483Z" - }, "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T14:40:28.227977Z", + "start_time": "2023-09-15T14:40:23.401277Z" } }, "outputs": [ @@ -290,8 +432,7 @@ "ancestor_id: omop:4302652\n", "ancestor_information_content: 10.432664389401875\n", "jaccard_similarity: 0.3793103448275862\n", - "phenodigm_score: 1.9892756287187816\n", - "\n" + "phenodigm_score: 1.9892756287187816\n" ] } ], @@ -304,10 +445,7 @@ "cell_type": "markdown", "id": "713ccfd2", "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "source": [ "## Paths" @@ -315,16 +453,13 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 99, "id": "a4922766", "metadata": { - "ExecuteTime": { - "end_time": "2023-08-19T02:04:57.473444Z", - "start_time": "2023-08-19T02:04:57.379279Z" - }, "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T14:40:53.629915Z", + "start_time": "2023-09-15T14:40:53.494915Z" } }, "outputs": [ @@ -332,6 +467,14 @@ "name": "stdout", "output_type": "stream", "text": [ + "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4195673', 'Angioplasty of posterior tibial artery')]\n", + "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4000756', 'Leg repair')]\n", + "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4091623', 'Surgical repair of lower extremity')]\n", + "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4181322', 'Surgical repair procedure by body site')]\n", + "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4161058', 'Surgical repair of head and neck structure')]\n", + "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4161828', 'Repair of eyelid')]\n", + "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4330850', 'Reconstruction of eyelid')]\n", + "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness')]\n", "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4195673', 'Angioplasty of posterior tibial artery')]\n", "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4000756', 'Leg repair')]\n", "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4091623', 'Surgical repair of lower extremity')]\n", @@ -347,14 +490,6 @@ "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4185115', 'Surgical repair')]\n", "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4045162', 'Reconstruction procedure')]\n", "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4330850', 'Reconstruction of eyelid')]\n", - "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness')]\n", - "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4195673', 'Angioplasty of posterior tibial artery')]\n", - "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4000756', 'Leg repair')]\n", - "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4091623', 'Surgical repair of lower extremity')]\n", - "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4181322', 'Surgical repair procedure by body site')]\n", - "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4161058', 'Surgical repair of head and neck structure')]\n", - "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4161828', 'Repair of eyelid')]\n", - "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4330850', 'Reconstruction of eyelid')]\n", "[('omop:4195673', 'Angioplasty of posterior tibial artery'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness'), ('omop:4210771', 'Reconstruction of eyelid, full-thickness')]\n" ] } @@ -368,10 +503,7 @@ "cell_type": "markdown", "id": "7a43673f", "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "source": [ "## Subgraphs" @@ -379,16 +511,13 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 100, "id": "7c9d31c3", "metadata": { - "ExecuteTime": { - "end_time": "2023-08-19T02:09:05.190787Z", - "start_time": "2023-08-19T02:09:05.113987Z" - }, "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T14:41:46.530407Z", + "start_time": "2023-09-15T14:41:46.438986Z" } }, "outputs": [], @@ -399,16 +528,13 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 101, "id": "c75e6964", "metadata": { - "ExecuteTime": { - "end_time": "2023-08-19T02:09:08.917990Z", - "start_time": "2023-08-19T02:09:07.817451Z" - }, "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T14:42:08.985408Z", + "start_time": "2023-09-15T14:42:07.753274Z" } }, "outputs": [], @@ -421,40 +547,90 @@ "cell_type": "markdown", "id": "1d249bf2", "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "source": [ "![img](output/angioplasty.png)" ] }, + { + "cell_type": "code", + "execution_count": 102, + "outputs": [], + "source": [ + "from oaklib.utilities.obograph_utils import default_stylemap_path\n", + "\n", + "stylemap = default_stylemap_path()\n", + "graph_to_image(g, seeds=seeds, imgfile=\"output/angioplasty-styled.png\", format=\"png\", stylemap=stylemap)" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-09-15T14:42:57.214723Z", + "start_time": "2023-09-15T14:42:56.284130Z" + } + }, + "id": "5ed46bd4f28c6f46" + }, { "cell_type": "markdown", - "id": "1a899a25", + "source": [ + "![img](output/angioplasty-styled.png)" + ], + "metadata": { + "collapsed": false + }, + "id": "8f4be3bf83f6d11e" + }, + { + "cell_type": "code", + "execution_count": 36, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "omop:2733612 Extraction of Left Brachial Vein, Percutaneous Approach ==> omop:4054559 Repair of blood vessel\n", + "omop:4054559 Repair of blood vessel ==> omop:4301351 Surgical procedure\n", + "omop:4195673 Angioplasty of posterior tibial artery ==> omop:4054559 Repair of blood vessel\n", + "omop:4210771 Reconstruction of eyelid, full-thickness ==> omop:4301351 Surgical procedure\n" + ] + } + ], + "source": [ + "for s, p, o in adapter.gap_fill_relationships([\"omop:4195673\", \"omop:4210771\", \n", + " \"omop:2733612\", \"omop:4054559\", \"omop:4301351\"],\n", + " predicates=[IS_A]):\n", + " print(s, adapter.label(s), \"==>\", o, adapter.label(o))" + ], "metadata": { "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T04:25:37.452226Z", + "start_time": "2023-09-15T04:25:37.430138Z" } }, + "id": "58f28aea969eea87" + }, + { + "cell_type": "markdown", + "id": "1a899a25", + "metadata": { + "collapsed": false + }, "source": [ "## Export to networkx" ] }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 37, "id": "75adc8f8", "metadata": { - "ExecuteTime": { - "end_time": "2023-08-19T02:10:44.733552Z", - "start_time": "2023-08-19T02:10:44.729004Z" - }, "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T04:25:48.291493Z", + "start_time": "2023-09-15T04:25:48.276439Z" } }, "outputs": [], @@ -465,26 +641,21 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 38, "id": "bf22958c", "metadata": { - "ExecuteTime": { - "end_time": "2023-08-19T02:10:55.153851Z", - "start_time": "2023-08-19T02:10:55.148269Z" - }, "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T04:25:48.812201Z", + "start_time": "2023-09-15T04:25:48.793466Z" } }, "outputs": [ { "data": { - "text/plain": [ - "NodeView(('omop:4027561', 'omop:2733612', 'omop:40489873', 'omop:4177089', 'omop:4302652', 'omop:4054559', 'omop:4324523', 'omop:4301351', 'omop:4002031', 'omop:4148948', 'omop:4012185', 'omop:4159949', 'omop:4181322', 'omop:4185115', 'omop:4134598', 'omop:4330850', 'omop:4210771', 'omop:4045162', 'omop:4161828', 'omop:4161058', 'omop:4249123', 'omop:4139008', 'omop:4031321', 'omop:4154279', 'omop:4233946', 'omop:4040721', 'omop:4000756', 'omop:4195673', 'omop:4050134', 'omop:4190070', 'omop:4050128', 'omop:4062347', 'omop:4091623', 'omop:4160912', 'omop:46271049', 'omop:4030028', 'omop:4181193', 'omop:4311041', 'omop:4331725', 'omop:4184453'))" - ] + "text/plain": "NodeView(('omop:4027561', 'omop:2733612', 'omop:40489873', 'omop:4177089', 'omop:4302652', 'omop:4054559', 'omop:4324523', 'omop:4301351', 'omop:4002031', 'omop:4148948', 'omop:4012185', 'omop:4159949', 'omop:4181322', 'omop:4185115', 'omop:4134598', 'omop:4330850', 'omop:4210771', 'omop:4045162', 'omop:4161828', 'omop:4161058', 'omop:4249123', 'omop:4139008', 'omop:4031321', 'omop:4154279', 'omop:4233946', 'omop:4040721', 'omop:4000756', 'omop:4195673', 'omop:4050134', 'omop:4190070', 'omop:4050128', 'omop:4062347', 'omop:4091623', 'omop:4160912', 'omop:46271049', 'omop:4030028', 'omop:4181193', 'omop:4311041', 'omop:4331725', 'omop:4184453'))" }, - "execution_count": 12, + "execution_count": 38, "metadata": {}, "output_type": "execute_result" } @@ -495,65 +666,21 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 39, "id": "60cba513", "metadata": { - "ExecuteTime": { - "end_time": "2023-08-19T02:12:39.526490Z", - "start_time": "2023-08-19T02:12:39.518261Z" - }, "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T04:25:49.419066Z", + "start_time": "2023-09-15T04:25:49.414123Z" } }, "outputs": [ { "data": { - "text/plain": [ - "{'omop:4027561': 0.05128205128205128,\n", - " 'omop:2733612': 0.05128205128205128,\n", - " 'omop:40489873': 0.07692307692307693,\n", - " 'omop:4177089': 0.07692307692307693,\n", - " 'omop:4302652': 0.10256410256410256,\n", - " 'omop:4054559': 0.10256410256410256,\n", - " 'omop:4324523': 0.07692307692307693,\n", - " 'omop:4301351': 0.1794871794871795,\n", - " 'omop:4002031': 0.10256410256410256,\n", - " 'omop:4148948': 0.1282051282051282,\n", - " 'omop:4012185': 0.07692307692307693,\n", - " 'omop:4159949': 0.05128205128205128,\n", - " 'omop:4181322': 0.10256410256410256,\n", - " 'omop:4185115': 0.10256410256410256,\n", - " 'omop:4134598': 0.05128205128205128,\n", - " 'omop:4330850': 0.07692307692307693,\n", - " 'omop:4210771': 0.02564102564102564,\n", - " 'omop:4045162': 0.05128205128205128,\n", - " 'omop:4161828': 0.07692307692307693,\n", - " 'omop:4161058': 0.07692307692307693,\n", - " 'omop:4249123': 0.05128205128205128,\n", - " 'omop:4139008': 0.07692307692307693,\n", - " 'omop:4031321': 0.05128205128205128,\n", - " 'omop:4154279': 0.05128205128205128,\n", - " 'omop:4233946': 0.07692307692307693,\n", - " 'omop:4040721': 0.02564102564102564,\n", - " 'omop:4000756': 0.07692307692307693,\n", - " 'omop:4195673': 0.05128205128205128,\n", - " 'omop:4050134': 0.05128205128205128,\n", - " 'omop:4190070': 0.1794871794871795,\n", - " 'omop:4050128': 0.07692307692307693,\n", - " 'omop:4062347': 0.07692307692307693,\n", - " 'omop:4091623': 0.10256410256410256,\n", - " 'omop:4160912': 0.07692307692307693,\n", - " 'omop:46271049': 0.07692307692307693,\n", - " 'omop:4030028': 0.10256410256410256,\n", - " 'omop:4181193': 0.07692307692307693,\n", - " 'omop:4311041': 0.07692307692307693,\n", - " 'omop:4331725': 0.07692307692307693,\n", - " 'omop:4184453': 0.05128205128205128}" - ] + "text/plain": "{'omop:4027561': 0.05128205128205128,\n 'omop:2733612': 0.05128205128205128,\n 'omop:40489873': 0.07692307692307693,\n 'omop:4177089': 0.07692307692307693,\n 'omop:4302652': 0.10256410256410256,\n 'omop:4054559': 0.10256410256410256,\n 'omop:4324523': 0.07692307692307693,\n 'omop:4301351': 0.1794871794871795,\n 'omop:4002031': 0.10256410256410256,\n 'omop:4148948': 0.1282051282051282,\n 'omop:4012185': 0.07692307692307693,\n 'omop:4159949': 0.05128205128205128,\n 'omop:4181322': 0.10256410256410256,\n 'omop:4185115': 0.10256410256410256,\n 'omop:4134598': 0.05128205128205128,\n 'omop:4330850': 0.07692307692307693,\n 'omop:4210771': 0.02564102564102564,\n 'omop:4045162': 0.05128205128205128,\n 'omop:4161828': 0.07692307692307693,\n 'omop:4161058': 0.07692307692307693,\n 'omop:4249123': 0.05128205128205128,\n 'omop:4139008': 0.07692307692307693,\n 'omop:4031321': 0.05128205128205128,\n 'omop:4154279': 0.05128205128205128,\n 'omop:4233946': 0.07692307692307693,\n 'omop:4040721': 0.02564102564102564,\n 'omop:4000756': 0.07692307692307693,\n 'omop:4195673': 0.05128205128205128,\n 'omop:4050134': 0.05128205128205128,\n 'omop:4190070': 0.1794871794871795,\n 'omop:4050128': 0.07692307692307693,\n 'omop:4062347': 0.07692307692307693,\n 'omop:4091623': 0.10256410256410256,\n 'omop:4160912': 0.07692307692307693,\n 'omop:46271049': 0.07692307692307693,\n 'omop:4030028': 0.10256410256410256,\n 'omop:4181193': 0.07692307692307693,\n 'omop:4311041': 0.07692307692307693,\n 'omop:4331725': 0.07692307692307693,\n 'omop:4184453': 0.05128205128205128}" }, - "execution_count": 13, + "execution_count": 39, "metadata": {}, "output_type": "execute_result" } @@ -568,10 +695,7 @@ "cell_type": "markdown", "id": "101e0269", "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "source": [ "## Term metadata" @@ -579,37 +703,21 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 40, "id": "50f31a17", "metadata": { - "ExecuteTime": { - "end_time": "2023-08-19T02:14:39.686649Z", - "start_time": "2023-08-19T02:14:35.034364Z" - }, "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T04:25:56.811388Z", + "start_time": "2023-09-15T04:25:52.143039Z" } }, "outputs": [ { "data": { - "text/plain": [ - "{'id': ['omop:4195673'],\n", - " 'omop:concept_class_id': ['Procedure'],\n", - " 'omop:concept_code': ['312644004'],\n", - " 'omop:domain_id': ['Procedure'],\n", - " 'omop:standard_concept': ['S'],\n", - " 'omop:valid_end_date': ['2099-12-31'],\n", - " 'omop:valid_start_date': ['2002-01-31'],\n", - " 'omop:vocabulary_id': ['SNOMED'],\n", - " 'rdfs:label': ['Angioplasty of posterior tibial artery'],\n", - " 'sh:prefix': ['omop'],\n", - " 'schema:url': ['https://athena.ohdsi.org/search-terms/terms/4195673'],\n", - " 'rdfs:isDefinedBy': ['https://athena.ohdsi.org/search-terms/terms/']}" - ] + "text/plain": "{'id': ['omop:4195673'],\n 'omop:concept_class_id': ['Procedure'],\n 'omop:concept_code': ['312644004'],\n 'omop:domain_id': ['Procedure'],\n 'omop:standard_concept': ['S'],\n 'omop:valid_end_date': ['2099-12-31'],\n 'omop:valid_start_date': ['2002-01-31'],\n 'omop:vocabulary_id': ['SNOMED'],\n 'rdfs:label': ['Angioplasty of posterior tibial artery'],\n 'sh:prefix': ['omop'],\n 'schema:url': ['https://athena.ohdsi.org/search-terms/terms/4195673'],\n 'rdfs:isDefinedBy': ['https://athena.ohdsi.org/search-terms/terms/']}" }, - "execution_count": 14, + "execution_count": 40, "metadata": {}, "output_type": "execute_result" } @@ -620,16 +728,13 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 41, "id": "fb74f7a7", "metadata": { - "ExecuteTime": { - "end_time": "2023-08-19T02:15:26.000340Z", - "start_time": "2023-08-19T02:15:25.994821Z" - }, "collapsed": false, - "jupyter": { - "outputs_hidden": false + "ExecuteTime": { + "end_time": "2023-09-15T04:26:01.804837Z", + "start_time": "2023-09-15T04:26:01.797258Z" } }, "outputs": [], @@ -641,10 +746,7 @@ "cell_type": "markdown", "id": "34a4367d-9271-4a78-949d-9905d59e5a7c", "metadata": { - "collapsed": false, - "jupyter": { - "outputs_hidden": false - } + "collapsed": false }, "source": [ "## Semantic Similarity using Rust" @@ -652,19 +754,29 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 42, "id": "62ccb499-dce3-4c4c-a5b0-836af733ea7f", - "metadata": {}, + "metadata": { + "ExecuteTime": { + "end_time": "2023-09-15T04:26:18.075123Z", + "start_time": "2023-09-15T04:26:13.741719Z" + } + }, "outputs": [], "source": [ - "adapter = get_adapter('semsimian:input/n3c.db')" + "simadapter = get_adapter('semsimian:input/n3c.db')" ] }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 43, "id": "e0c59472-b6a3-47ec-abdc-93db1cbfde39", - "metadata": {}, + "metadata": { + "ExecuteTime": { + "end_time": "2023-09-15T04:26:18.080007Z", + "start_time": "2023-09-15T04:26:18.075959Z" + } + }, "outputs": [], "source": [ "terms1 = [\"omop:4195673\", \"omop:4000756\", \"omop:4002031\", \"omop:4012185\"]\n", @@ -673,50 +785,60 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 44, "id": "92807c91-8ca0-4a8a-bece-460c9799670e", - "metadata": {}, + "metadata": { + "ExecuteTime": { + "end_time": "2023-09-15T04:26:56.073662Z", + "start_time": "2023-09-15T04:26:56.071143Z" + } + }, "outputs": [], "source": [ - "tsps = adapter.termset_pairwise_similarity(terms1, terms2, predicates=[IS_A])" + "tsps = simadapter.termset_pairwise_similarity(terms1, terms2, predicates=[IS_A])" ] }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 45, "id": "6402925b-bd2a-4fe2-b17d-28607f3f972a", - "metadata": {}, + "metadata": { + "ExecuteTime": { + "end_time": "2023-09-15T04:26:56.096332Z", + "start_time": "2023-09-15T04:26:56.088150Z" + } + }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "subject_termset:\n", - " omop:4002031:\n", - " id: omop:4002031\n", - " label: Cardiovascular system repair\n", - " omop:4195673:\n", - " id: omop:4195673\n", - " label: Angioplasty of posterior tibial artery\n", " omop:4000756:\n", " id: omop:4000756\n", " label: Leg repair\n", + " omop:4002031:\n", + " id: omop:4002031\n", + " label: Cardiovascular system repair\n", " omop:4012185:\n", " id: omop:4012185\n", " label: Cardiovascular surgical procedure\n", + " omop:4195673:\n", + " id: omop:4195673\n", + " label: Angioplasty of posterior tibial artery\n", "object_termset:\n", " omop:4050128:\n", " id: omop:4050128\n", " label: Angioplasty of artery\n", + " omop:4054559:\n", + " id: omop:4054559\n", + " label: Repair of blood vessel\n", " omop:4030028:\n", " id: omop:4030028\n", " label: Surgical procedure on lower extremity\n", " omop:4050134:\n", " id: omop:4050134\n", " label: Angioplasty of crural artery\n", - " omop:4054559:\n", - " id: omop:4054559\n", - " label: Repair of blood vessel\n", "subject_best_matches:\n", " omop:4000756:\n", " match_source: omop:4000756\n", @@ -780,15 +902,15 @@ " score: 9.734760203165433\n", " similarity:\n", " subject_id: omop:4030028\n", - " object_id: omop:4000756\n", + " object_id: omop:4195673\n", " ancestor_id: omop:4030028\n", " ancestor_label: Surgical procedure on lower extremity\n", " ancestor_information_content: 9.734760203165433\n", - " jaccard_similarity: 0.375\n", - " phenodigm_score: 1.910637348160827\n", + " jaccard_similarity: 0.12\n", + " phenodigm_score: 1.0808197002182427\n", " match_source_label: Surgical procedure on lower extremity\n", - " match_target: omop:4000756\n", - " match_target_label: Leg repair\n", + " match_target: omop:4195673\n", + " match_target_label: Angioplasty of posterior tibial artery\n", " omop:4050128:\n", " match_source: omop:4050128\n", " score: 11.720054065584758\n", @@ -833,8 +955,7 @@ " match_target_label: Angioplasty of posterior tibial artery\n", "average_score: 12.13040293072116\n", "best_score: 18.911289573272484\n", - "metric: ancestor_information_content\n", - "\n" + "metric: ancestor_information_content\n" ] } ], @@ -842,13 +963,491 @@ "print(yaml_dumper.dumps(tsps))" ] }, + { + "cell_type": "markdown", + "source": [ + "## Mondo Mappings\n", + "\n", + "We will now demonstrate mapping to Mondo. Mondo has mappings to many sources - but not OMOP.\n", + "\n", + "The n3c OMOP does however have mappings to SNOMED (structured in an unusual way in the OWL),\n", + "so we can use these as a join point to Mondo" + ], + "metadata": { + "collapsed": false + }, + "id": "168eb5ef59b452e8" + }, + { + "cell_type": "code", + "execution_count": 49, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Parkinson's disease\n" + ] + } + ], + "source": [ + "PD = \"omop:381270\"\n", + "print(adapter.label(PD))" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-09-15T04:59:42.188798Z", + "start_time": "2023-09-15T04:59:42.168952Z" + } + }, + "id": "ec1958f2eb7fa093" + }, + { + "cell_type": "code", + "execution_count": 55, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['SCTID:49049000']\n" + ] + } + ], + "source": [ + "def omop_mappings(id):\n", + " # we SHOULD be able to just do this:\n", + " # adapter.mappings([id], ...)\n", + " # however, mappings are stored in the upstream OWL in a non-standard way\n", + " return [f\"SCTID:{code}\" for code in adapter.entity_metadata_map(id).get(\"omop:concept_code\", [])]\n", + "\n", + "print(omop_mappings(PD))\n", + " " + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-09-15T05:14:35.107856Z", + "start_time": "2023-09-15T05:14:31.182616Z" + } + }, + "id": "2f8af69fa46358ff" + }, { "cell_type": "code", - "execution_count": null, - "id": "83dd744e-bad4-4a1d-80e7-46c77019bf6b", - "metadata": {}, + "execution_count": 56, "outputs": [], - "source": [] + "source": [ + "mondo = get_adapter(\"sqlite:obo:mondo\")" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-09-15T05:14:37.745116Z", + "start_time": "2023-09-15T05:14:37.735342Z" + } + }, + "id": "1487489855235878" + }, + { + "cell_type": "code", + "execution_count": 60, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "MONDO:0005180 oio:hasDbXref SCTID:49049000 omop:381270\n" + ] + } + ], + "source": [ + "for m in mondo.sssom_mappings(omop_mappings(PD)):\n", + " print(m.subject_id,m.predicate_id, m.object_id, PD)" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-09-15T05:17:11.448720Z", + "start_time": "2023-09-15T05:17:07.161797Z" + } + }, + "id": "20bd128d90378cd0" + }, + { + "cell_type": "code", + "execution_count": 65, + "outputs": [], + "source": [ + "def transitive_mappings(omop_id):\n", + " for m in mondo.sssom_mappings(omop_mappings(omop_id)):\n", + " for m2 in mondo.sssom_mappings([m.subject_id]):\n", + " yield omop_id, m.subject_id, m.object_id, m2.object_id" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-09-15T05:20:35.356850Z", + "start_time": "2023-09-15T05:20:35.346694Z" + } + }, + "id": "a492acf664c08e46" + }, + { + "cell_type": "code", + "execution_count": 66, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "('omop:381270', 'MONDO:0005180', 'SCTID:49049000', 'DOID:14330')\n", + "('omop:381270', 'MONDO:0005180', 'SCTID:49049000', 'EFO:0002508')\n", + "('omop:381270', 'MONDO:0005180', 'SCTID:49049000', 'ICD9:332')\n", + "('omop:381270', 'MONDO:0005180', 'SCTID:49049000', 'ICD9:332.0')\n", + "('omop:381270', 'MONDO:0005180', 'SCTID:49049000', 'MESH:D010300')\n", + "('omop:381270', 'MONDO:0005180', 'SCTID:49049000', 'NCIT:C26845')\n", + "('omop:381270', 'MONDO:0005180', 'SCTID:49049000', 'NIFSTD:birnlex_2098')\n", + "('omop:381270', 'MONDO:0005180', 'SCTID:49049000', 'OMIMPS:168600')\n", + "('omop:381270', 'MONDO:0005180', 'SCTID:49049000', 'Orphanet:319705')\n", + "('omop:381270', 'MONDO:0005180', 'SCTID:49049000', 'SCTID:49049000')\n", + "('omop:381270', 'MONDO:0005180', 'SCTID:49049000', 'UMLS:C0030567')\n", + "('omop:381270', 'MONDO:0005180', 'SCTID:49049000', '')\n", + "('omop:381270', 'MONDO:0005180', 'SCTID:49049000', 'NCIT:C26845')\n", + "('omop:381270', 'MONDO:0005180', 'SCTID:49049000', 'DOID:14330')\n", + "('omop:381270', 'MONDO:0005180', 'SCTID:49049000', '')\n", + "('omop:381270', 'MONDO:0005180', 'SCTID:49049000', '')\n", + "('omop:381270', 'MONDO:0005180', 'SCTID:49049000', '')\n" + ] + } + ], + "source": [ + "for m in transitive_mappings(PD):\n", + " print(m)" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-09-15T05:20:40.039451Z", + "start_time": "2023-09-15T05:20:35.546516Z" + } + }, + "id": "a385db042406062f" + }, + { + "cell_type": "markdown", + "source": [ + "## Value Set Expansion" + ], + "metadata": { + "collapsed": false + }, + "id": "879cbe741f9fb617" + }, + { + "cell_type": "code", + "execution_count": 80, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "default_resolver:\r\n", + " name: omop\r\n", + " shorthand: sqlite:input/n3c.db\r\n" + ] + } + ], + "source": [ + "!cat input/n3c_config.yaml" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-09-15T14:27:49.893020Z", + "start_time": "2023-09-15T14:27:49.703855Z" + } + }, + "id": "f0e6a9ffefc7eb88" + }, + { + "cell_type": "code", + "execution_count": 83, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "id: https://w3id.org/linkml/examples/enums\r\n", + "title: Dynamic Enums Example\r\n", + "name: dynamicenums-example\r\n", + "description: This demonstrates the use of dynamic enums\r\n", + "license: https://creativecommons.org/publicdomain/zero/1.0/\r\n", + "\r\n", + "prefixes:\r\n", + " linkml: https://w3id.org/linkml/\r\n", + " ex: https://w3id.org/linkml/examples/enums/\r\n", + " sh: https://w3id.org/shacl/\r\n", + " bioregistry: https://bioregistry.io/registry/\r\n", + " MONDO: http://purl.obolibrary.org/obo/MONDO_\r\n", + " omop: https://athena.ohdsi.org/search-terms/terms/\r\n", + " loinc: http://loinc.org/\r\n", + "\r\n", + "default_prefix: ex\r\n", + "default_range: string\r\n", + "\r\n", + "imports:\r\n", + " - linkml:types\r\n", + "\r\n", + "\r\n", + "enums:\r\n", + " Synucleinopathies:\r\n", + " reachable_from:\r\n", + " include_self: true\r\n", + " source_ontology: local:omop\r\n", + " source_nodes:\r\n", + " - omop:37203944\r\n" + ] + } + ], + "source": [ + "!cat input/n3c-example-intensional-value-set.yaml" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-09-15T14:28:40.274869Z", + "start_time": "2023-09-15T14:28:40.141236Z" + } + }, + "id": "b228c23ad4b2f982" + }, + { + "cell_type": "code", + "execution_count": 84, + "outputs": [], + "source": [ + "from oaklib.utilities.subsets.value_set_expander import ValueSetExpander\n", + "\n", + "expander = ValueSetExpander()" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-09-15T14:28:40.951582Z", + "start_time": "2023-09-15T14:28:40.947598Z" + } + }, + "id": "60f5ec5f1be37d9b" + }, + { + "cell_type": "code", + "execution_count": 85, + "outputs": [], + "source": [ + "from oaklib.datamodels.value_set_configuration import ValueSetConfiguration\n", + "from linkml_runtime.loaders import yaml_loader\n", + "\n", + "expander.configuration = yaml_loader.load(\"input/n3c_config.yaml\", target_class=ValueSetConfiguration)" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-09-15T14:28:41.489376Z", + "start_time": "2023-09-15T14:28:41.482066Z" + } + }, + "id": "a022ea4976c741d2" + }, + { + "cell_type": "code", + "execution_count": 86, + "outputs": [], + "source": [ + "expander.expand_in_place(\n", + " schema_path=\"input/n3c-example-intensional-value-set.yaml\", value_set_names=[\"Synucleinopathies\"], \n", + " output_path=\"output/synucleinopathies.yaml\"\n", + " )" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-09-15T14:28:44.838681Z", + "start_time": "2023-09-15T14:28:42.327079Z" + } + }, + "id": "523bb528b10569c8" + }, + { + "cell_type": "code", + "execution_count": 87, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "id: https://w3id.org/linkml/examples/enums\r\n", + "title: Dynamic Enums Example\r\n", + "name: dynamicenums-example\r\n", + "description: This demonstrates the use of dynamic enums\r\n", + "license: https://creativecommons.org/publicdomain/zero/1.0/\r\n", + "\r\n", + "prefixes:\r\n", + " linkml: https://w3id.org/linkml/\r\n", + " ex: https://w3id.org/linkml/examples/enums/\r\n", + " sh: https://w3id.org/shacl/\r\n", + " bioregistry: https://bioregistry.io/registry/\r\n", + " MONDO: http://purl.obolibrary.org/obo/MONDO_\r\n", + " omop: https://athena.ohdsi.org/search-terms/terms/\r\n", + " loinc: http://loinc.org/\r\n", + "\r\n", + "default_prefix: ex\r\n", + "default_range: string\r\n", + "\r\n", + "imports:\r\n", + "- linkml:types\r\n", + "\r\n", + "\r\n", + "enums:\r\n", + " Synucleinopathies:\r\n", + " reachable_from:\r\n", + " include_self: true\r\n", + " source_ontology: local:omop\r\n", + " source_nodes:\r\n", + " - omop:37203944\r\n", + " permissible_values:\r\n", + " omop:44800441:\r\n", + " text: omop:44800441\r\n", + " description: '[X]Parkinsonism in diseases classified elsewhere'\r\n", + " meaning: omop:44800441\r\n", + " omop:380701:\r\n", + " text: omop:380701\r\n", + " description: Diffuse Lewy body disease\r\n", + " meaning: omop:380701\r\n", + " omop:1340428:\r\n", + " text: omop:1340428\r\n", + " description: Exacerbation of Parkinson's disease\r\n", + " meaning: omop:1340428\r\n", + " omop:4219273:\r\n", + " text: omop:4219273\r\n", + " description: Parkinsonian syndrome with idiopathic orthostatic hypotension\r\n", + " meaning: omop:4219273\r\n", + " omop:4196433:\r\n", + " text: omop:4196433\r\n", + " description: Senile dementia of the Lewy body type\r\n", + " meaning: omop:4196433\r\n", + " omop:1340521:\r\n", + " text: omop:1340521\r\n", + " description: Progression of pure autonomic failure\r\n", + " meaning: omop:1340521\r\n", + " omop:40485457:\r\n", + " text: omop:40485457\r\n", + " description: Multiple system atrophy, Parkinson's variant\r\n", + " meaning: omop:40485457\r\n", + " omop:37396747:\r\n", + " text: omop:37396747\r\n", + " description: Autosomal dominant late onset Parkinson disease\r\n", + " meaning: omop:37396747\r\n", + " omop:4100236:\r\n", + " text: omop:4100236\r\n", + " description: Parkinsonism with orthostatic hypotension\r\n", + " meaning: omop:4100236\r\n", + " omop:36713737:\r\n", + " text: omop:36713737\r\n", + " description: Orthostatic hypotension co-occurrent and due to Parkinson's disease\r\n", + " meaning: omop:36713737\r\n", + " omop:381270:\r\n", + " text: omop:381270\r\n", + " description: Parkinson's disease\r\n", + " meaning: omop:381270\r\n", + " omop:4178618:\r\n", + " text: omop:4178618\r\n", + " description: Diffuse Lewy body disease with spongiform cortical change\r\n", + " meaning: omop:4178618\r\n", + " omop:37203944:\r\n", + " text: omop:37203944\r\n", + " description: Synucleinopathy\r\n", + " meaning: omop:37203944\r\n", + " omop:37110776:\r\n", + " text: omop:37110776\r\n", + " description: Atypical juvenile parkinsonism\r\n", + " meaning: omop:37110776\r\n", + " omop:37110499:\r\n", + " text: omop:37110499\r\n", + " description: Sporadic Parkinson disease\r\n", + " meaning: omop:37110499\r\n", + " omop:608078:\r\n", + " text: omop:608078\r\n", + " description: Autosomal recessive familial Parkinson disease\r\n", + " meaning: omop:608078\r\n", + " omop:37399497:\r\n", + " text: omop:37399497\r\n", + " description: Early onset parkinsonism and intellectual disability syndrome\r\n", + " meaning: omop:37399497\r\n", + " omop:37395785:\r\n", + " text: omop:37395785\r\n", + " description: Young onset Parkinson disease\r\n", + " meaning: omop:37395785\r\n", + " omop:44782763:\r\n", + " text: omop:44782763\r\n", + " description: Lewy body dementia with behavioral disturbance\r\n", + " meaning: omop:44782763\r\n", + " omop:4044053:\r\n", + " text: omop:4044053\r\n", + " description: Multiple system atrophy\r\n", + " meaning: omop:4044053\r\n", + " omop:40484594:\r\n", + " text: omop:40484594\r\n", + " description: Multiple system atrophy, cerebellar variant\r\n", + " meaning: omop:40484594\r\n", + " omop:4309357:\r\n", + " text: omop:4309357\r\n", + " description: Pure autonomic failure\r\n", + " meaning: omop:4309357\r\n", + " omop:4047751:\r\n", + " text: omop:4047751\r\n", + " description: Juvenile Parkinson's disease\r\n", + " meaning: omop:4047751\r\n" + ] + } + ], + "source": [ + "!cat output/synucleinopathies.yaml" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "end_time": "2023-09-15T14:29:00.132414Z", + "start_time": "2023-09-15T14:28:59.962035Z" + } + }, + "id": "76badc2323d4088b" + }, + { + "cell_type": "markdown", + "source": [ + "## Command Line Value Set Expansion" + ], + "metadata": { + "collapsed": false + }, + "id": "2669df54a24fbd9d" + }, + { + "cell_type": "code", + "execution_count": 73, + "outputs": [], + "source": [ + "!vskit expand -c input/n3c_config.yaml -s input/n3c-example-intensional-value-set.yaml Synucleinopathies -o output/synucleinopathies.yaml" + ], + "metadata": { + "collapsed": false, + "ExecuteTime": { + "start_time": "2023-09-15T05:49:31.599998Z" + } + }, + "id": "12cc6be422e2d8ad" } ], "metadata": {