Skip to content

Commit

Permalink
Adding definition validation functionality (#738)
Browse files Browse the repository at this point in the history
* Adding definition validation functionality

* handle whole-ontology case

* add missing

* validate synonyms
  • Loading branch information
cmungall authored Apr 16, 2024
1 parent 0a734a8 commit a52eed5
Show file tree
Hide file tree
Showing 32 changed files with 6,548 additions and 3,967 deletions.
50 changes: 49 additions & 1 deletion docs/howtos/use-llms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -115,9 +115,51 @@ Suggesting Definitions
finger toe \
--style-hints "write definitions in formal genus-differentia form"
Validating Definitions
~~~~~~~~~~~~~~~~~~~~~~

The LLM adapter currently interprets ``validate-definitions`` as comparing the specified definition
against the abstracts of papers cited in the definition provenance, or by comparing the definition
against the database objects that are cited as definition provenance.

Here is an example of validating definitions for GO terms:

.. code-block:: bash
runoak --stacktrace -i llm:sqlite:obo:go validate-definitions \
i^GO: -o out.jsonl -O jsonl
The semsql version of GO has other ontologies merged in, so the ``i^GO:`` query only validates
against actual GO terms.

You can also pass in a configuration object.
This should conform to the `Validation Data Model <https://w3id.org/oak/validation-datamodel>`_

For example, this configuration yaml provides a specific prompt and also a URL for
documentation aimed at ontology developers.

.. code-block:: yaml
prompt_info: Please also use the following GO guidelines
documentation_objects:
- https://wiki.geneontology.org/Guidelines_for_GO_textual_definitions
All specified URLs are downloaded and converted to text and included in the prompt.

The configuration yaml is passed in as follows:

.. code-block:: bash
runoak --stacktrace -i llm:{claude-3-opus}:sqlite:obo:go validate-definitions \
-C src/oaklib/conf/go-definition-validation-llm-config.yaml i^GO: -O yaml
Validating Mappings
~~~~~~~~~~~~~~~~~~~

The LLM adapter validates mappings by looking up info on the mapped entity and
comparing it with the main entity.

.. code-block:: bash
runoak --stacktrace -i llm:{gpt-4}:sqlite:obo:go validate-mappings \
Expand Down Expand Up @@ -165,8 +207,14 @@ as a developer, then you can do this:
This will install the plugin in the same environment as OAK.

TODO: instructions for non-developers.
If you need to update this:

.. code-block:: bash
cd ontology-access-kit
poetry run llm install -U llm-gemini
TODO: instructions for non-developers.

Mixtral via Ollama and LiteLLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
28 changes: 24 additions & 4 deletions docs/packages/interfaces/validator.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,24 @@
Validator Interface
--------------------

.. warning ::
The Validator Interface provides access to a number of different validation operations over ontologies.

Currently the main validator methods are only implemented for :ref:`SqlDatabaseImplementation`
The notion of validation in OAK is intentionally very flexible, and may encompass:

The validate method is configured using a *metadata schema*. The default one used is:
* *Schema* validation, for example, checking definitions are strings and have 0..1 cardinality.
* *Logical* validation, using a reasoner.
* *Lexical* validation, for example, ensuring there are no spelling errors
* *Stylistic* validation, against a style guide
* *Content* validation, checking the content of the ontology against domain knowledge or other ontologies.

- `Ontology Metadata <https://incatools.github.io/ontology-access-kit/datamodels/ontology-metadata/index.html>`_
Different adapters may implement different portions of this.

Schema Validation
~~~~~~~~~~~~~~~~~

The core validate method is configured using a *metadata schema*. The default one used is:

- `Ontology Metadata <https://w3id.org/oak/ontology-metadata>`_

This is specified using LinkML which provides an expressive way to state constraints on metadata elements,
such as :ref:`AnnotationProperty` assertions in ontologies. For example, this schema states that definition
Expand All @@ -19,6 +30,15 @@ Different projects may wish to configure this - it is possible to pass in a diff

For more details see `this howto guide <https://incatools.github.io/ontology-access-kit/howtos/validate-an-obo-ontology>`_

.. warning::

Currently only implemented for :ref`sql_implementation`

LLM-based validation
~~~~~~~~~~~~~~~~~~~~

See :ref:`use_llms`


.. currentmodule:: oaklib.interfaces.validator_interface

Expand Down
3 changes: 2 additions & 1 deletion docs/packages/utilities.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,6 @@ being turned into :ref:`interfaces`.
lexical.lexical_indexer
subsets.slimmer_utils
apikey_manager
taxon/taxon_constraint_utils
taxon.taxon_constraint_utils
table_filler

681 changes: 681 additions & 0 deletions notebooks/Commands/ValidateDefinitions.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion notebooks/Commands/ValidateMappings.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"id": "0a28b88d-4deb-4d0a-a110-f27adf077e23",
"metadata": {},
"source": [
"# OAK apply command\n",
"# OAK validate-mappings command\n",
"\n",
"This notebook is intended as a supplement to the [main OAK CLI docs](https://incatools.github.io/ontology-access-kit/cli.html).\n",
"\n",
Expand Down
1 change: 1 addition & 0 deletions notebooks/Commands/input/validate-definition-conf.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
lookup_references: true
Loading

0 comments on commit a52eed5

Please sign in to comment.