Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding more docs on how to use LLMs with OAK #819

Merged
merged 2 commits into from
Oct 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 168 additions & 0 deletions docs/examples/Adapters/LLM/LLM-Tutorial.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": [
"# LLM Tutorial\n",
"\n",
"This walks through using OAK through an LLM wrapper.\n",
"\n",
"See also [How-to guide](https://incatools.github.io/ontology-access-kit/howtos/use-llms.html).\n",
"\n",
"Note for this to work, you must either install OAK with llm extras, or do a separate install\n",
"of `pipx install llm`.\n",
"\n",
"You will also need the API keys for an LLM service, or a proxy to a local model.\n",
"\n",
" "
],
"id": "cf2572dda785deed"
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"## Annotate Command\n",
"\n",
"Note the first time you run this it may be slow, as it needs to perform an initial embedding.\n",
"\n",
"Here we use the standard OAK `annotate` command, but instead of the usual adapter (e.g. `sqlite:obo:cl`), we pass in a wrapped adapter, using the `gpt4-o` model.\n",
"\n",
"We strongly recommend passing in categories, as this helps the model ground the kinds of terms you are interested in."
],
"id": "95ff062ec749f629"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2024-10-22T02:00:12.384637Z",
"start_time": "2024-10-22T01:59:41.305531Z"
}
},
"cell_type": "code",
"source": "!runoak --stacktrace -i llm:{gpt-4o}:sqlite:obo:cl annotate \"sequencing was performed on splenic and thymic macrophages\" --category CellType \n",
"id": "8044c89577c9625",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"object_id: CL:0000871\r\n",
"object_label: splenic macrophage\r\n",
"object_categories:\r\n",
"- CellType\r\n",
"subject_label: splenic macrophages\r\n",
"\r\n",
"---\r\n",
"object_id: CL:0000866\r\n",
"object_label: thymic macrophage\r\n",
"object_categories:\r\n",
"- CellType\r\n",
"subject_label: thymic macrophages\r\n",
"start: 40\r\n",
"end: 58\r\n"
]
}
],
"execution_count": 3
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"Currently the specific span coordinates are only returns for concepts that can be clearly mapped back to the text.\n",
"\n",
"You can also use the standard `--whole-text` (`-W`) option to match the entire text span, rather than to annotate segments:"
],
"id": "a014e2a04badc986"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2024-10-22T02:04:59.758809Z",
"start_time": "2024-10-22T02:04:43.271571Z"
}
},
"cell_type": "code",
"source": "!runoak --stacktrace -i llm:{gpt-4o}:sqlite:obo:cl annotate -W \"macrophage found in the thymus\" --category CellType ",
"id": "4cdcabaad7e6268e",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"object_id: CL:0000866\r\n",
"object_label: thymic macrophage\r\n",
"subject_label: macrophage found in the thymus\r\n"
]
}
],
"execution_count": 6
},
{
"metadata": {},
"cell_type": "markdown",
"source": [
"## Suggesting Definitions\n",
"\n"
],
"id": "7f06875f274fbd06"
},
{
"metadata": {
"ExecuteTime": {
"end_time": "2024-10-22T02:02:45.880910Z",
"start_time": "2024-10-22T02:02:27.963766Z"
}
},
"cell_type": "code",
"source": [
"!runoak -i llm:sqlite:obo:uberon generate-definitions \\\n",
" finger toe \\\n",
" --style-hints \"write definitions in formal genus-differentia form\""
],
"id": "3a9f92f9e258b301",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"add definition 'A manual digit is a type of anatomical structure characterized as one of the distal appendages found on the human hand, distinct from those structures on other limbs, and is primarily comprised of phalanges, a metacarpal bone, and associated soft tissue.' to UBERON:0002389\r\n",
"add definition 'A pedal digit is a type of anatomical structure that is a subdivision of the limb and is specifically located at the distal end of the pes, commonly known as the foot, in vertebrates.' to UBERON:0001466\r\n"
]
}
],
"execution_count": 5
},
{
"metadata": {},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": "",
"id": "3f64b6dc3ae0a288"
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
1 change: 1 addition & 0 deletions docs/examples/Adapters/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ Adapter Examples
:maxdepth: 2

Ubergraph/Ubergraph-Tutorial
LLM/LLM-Tutorial
4 changes: 3 additions & 1 deletion docs/howtos/use-llms.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ LLM CLI tools such as the datasette ``llm`` tool pair naturally
OAK LLM Adapter
---------------

See also the `LLM Notebook <https://incatools.github.io/ontology-access-kit/examples/Adapters/LLM/LLM-Tutorial.html>`_.

OAK provides a number of different adapters (implementations) for each of its interfaces.
Some adapters provide direct access to an ontology or collection of ontologies; others act as *wrappers*
onto another adapter, and inject additional functionality.
Expand Down Expand Up @@ -246,7 +248,7 @@ Then you can use the model in OAK:
Mixtral via groq and LiteLLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`groq <https://groq.com/>` provides an API over souped-up hardware running Llama2 and Mixtral.
`groq <https://groq.com/>`_ provides an API over souped-up hardware running Llama2 and Mixtral.
You can configure in a similar way to ollama above, but here we are proxying to a remote server:

. code-block:: bash
Expand Down
Loading
Loading