From 55662181136c4b8e8cab71bd67decd6144f46739 Mon Sep 17 00:00:00 2001 From: Shilpa Kancharla Date: Tue, 21 May 2024 14:24:23 -0700 Subject: [PATCH 1/6] Add example for adding context information for prompting --- .../Adding_context_information.ipynb | 152 ++++++++++++++++++ 1 file changed, 152 insertions(+) create mode 100644 examples/prompting/Adding_context_information.ipynb diff --git a/examples/prompting/Adding_context_information.ipynb b/examples/prompting/Adding_context_information.ipynb new file mode 100644 index 000000000..eb5b62ac6 --- /dev/null +++ b/examples/prompting/Adding_context_information.ipynb @@ -0,0 +1,152 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# Gemini API: Adding context information" + ], + "metadata": { + "id": "sP8PQnz1QrcF" + } + }, + { + "cell_type": "markdown", + "source": [ + "\n", + " \n", + "
\n", + " Run in Google Colab\n", + "
" + ], + "metadata": { + "id": "bxGr_x3MRA0z" + } + }, + { + "cell_type": "markdown", + "source": [ + "While LLMs are trained extensively on various documents and data, the LLM does not know everything. New information or information that is not easily accessible cannot be known by the LLM, unless it was specifically added to its corpus of knowledge somehow. For this reason, it is sometimes necessary to provide the LLM, with information and context necessary to answer our queries by providing additional context." + ], + "metadata": { + "id": "ysy--KfNRrCq" + } + }, + { + "cell_type": "code", + "source": [ + "!pip install -U -q google-generativeai" + ], + "metadata": { + "id": "Ne-3gnXqR0hI" + }, + "execution_count": 1, + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "EconMHePQHGw" + }, + "outputs": [], + "source": [ + "import google.generativeai as genai\n", + "\n", + "from IPython.display import Markdown" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Configure your API key\n", + "\n", + "To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example." + ], + "metadata": { + "id": "eomJzCa6lb90" + } + }, + { + "cell_type": "code", + "source": [ + "from google.colab import userdata\n", + "GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')\n", + "\n", + "genai.configure(api_key=GOOGLE_API_KEY)" + ], + "metadata": { + "id": "v-JZzORUpVR2" + }, + "execution_count": 3, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Example" + ], + "metadata": { + "id": "JljcHgI2ltTY" + } + }, + { + "cell_type": "code", + "source": [ + "# the list as of April 2024\n", + "prompt = \"\"\"\n", + "QUERY: provide a list of atheletes that competed in olympics exactly 9 times.\n", + "CONTEXT:\n", + "Ian Millar, 10\n", + "Hubert Raudaschl, 9\n", + "Afanasijs Kuzmins, 9\n", + "Nino Salukvadze, 9\n", + "Piero d'Inzeo, 8\n", + "Raimondo d'Inzeo, 8\n", + "Claudia Pechstein, 8\n", + "Jaqueline Mourão, 8\n", + "Ivan Osiier, 7\n", + "François Lafortune, Jr, 7\n", + "\n", + "ANSWER:\"\"\"\n", + "model = genai.GenerativeModel(model_name='gemini-1.5-flash-latest', generation_config={\"temperature\": 0})\n", + "Markdown(model.generate_content(prompt).text)" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 104 + }, + "id": "uFcm6Dd7ls_F", + "outputId": "37628141-885c-4cc4-dcd4-3c340af3a574" + }, + "execution_count": 5, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ], + "text/markdown": "The list you provided already includes all the athletes who competed in the Olympics exactly 9 times:\n\n* **Hubert Raudaschl**\n* **Afanasijs Kuzmins**\n* **Nino Salukvadze** \n" + }, + "metadata": {}, + "execution_count": 5 + } + ] + } + ] +} \ No newline at end of file From ae93721521bd103a8a23cac9b5576e97082c9033 Mon Sep 17 00:00:00 2001 From: Shilpa Kancharla Date: Tue, 21 May 2024 14:38:48 -0700 Subject: [PATCH 2/6] delete --- .../Adding_context_information.ipynb | 152 ------------------ 1 file changed, 152 deletions(-) delete mode 100644 examples/prompting/Adding_context_information.ipynb diff --git a/examples/prompting/Adding_context_information.ipynb b/examples/prompting/Adding_context_information.ipynb deleted file mode 100644 index eb5b62ac6..000000000 --- a/examples/prompting/Adding_context_information.ipynb +++ /dev/null @@ -1,152 +0,0 @@ -{ - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [] - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - } - }, - "cells": [ - { - "cell_type": "markdown", - "source": [ - "# Gemini API: Adding context information" - ], - "metadata": { - "id": "sP8PQnz1QrcF" - } - }, - { - "cell_type": "markdown", - "source": [ - "\n", - " \n", - "
\n", - " Run in Google Colab\n", - "
" - ], - "metadata": { - "id": "bxGr_x3MRA0z" - } - }, - { - "cell_type": "markdown", - "source": [ - "While LLMs are trained extensively on various documents and data, the LLM does not know everything. New information or information that is not easily accessible cannot be known by the LLM, unless it was specifically added to its corpus of knowledge somehow. For this reason, it is sometimes necessary to provide the LLM, with information and context necessary to answer our queries by providing additional context." - ], - "metadata": { - "id": "ysy--KfNRrCq" - } - }, - { - "cell_type": "code", - "source": [ - "!pip install -U -q google-generativeai" - ], - "metadata": { - "id": "Ne-3gnXqR0hI" - }, - "execution_count": 1, - "outputs": [] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": { - "id": "EconMHePQHGw" - }, - "outputs": [], - "source": [ - "import google.generativeai as genai\n", - "\n", - "from IPython.display import Markdown" - ] - }, - { - "cell_type": "markdown", - "source": [ - "## Configure your API key\n", - "\n", - "To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example." - ], - "metadata": { - "id": "eomJzCa6lb90" - } - }, - { - "cell_type": "code", - "source": [ - "from google.colab import userdata\n", - "GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')\n", - "\n", - "genai.configure(api_key=GOOGLE_API_KEY)" - ], - "metadata": { - "id": "v-JZzORUpVR2" - }, - "execution_count": 3, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "## Example" - ], - "metadata": { - "id": "JljcHgI2ltTY" - } - }, - { - "cell_type": "code", - "source": [ - "# the list as of April 2024\n", - "prompt = \"\"\"\n", - "QUERY: provide a list of atheletes that competed in olympics exactly 9 times.\n", - "CONTEXT:\n", - "Ian Millar, 10\n", - "Hubert Raudaschl, 9\n", - "Afanasijs Kuzmins, 9\n", - "Nino Salukvadze, 9\n", - "Piero d'Inzeo, 8\n", - "Raimondo d'Inzeo, 8\n", - "Claudia Pechstein, 8\n", - "Jaqueline Mourão, 8\n", - "Ivan Osiier, 7\n", - "François Lafortune, Jr, 7\n", - "\n", - "ANSWER:\"\"\"\n", - "model = genai.GenerativeModel(model_name='gemini-1.5-flash-latest', generation_config={\"temperature\": 0})\n", - "Markdown(model.generate_content(prompt).text)" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 104 - }, - "id": "uFcm6Dd7ls_F", - "outputId": "37628141-885c-4cc4-dcd4-3c340af3a574" - }, - "execution_count": 5, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "" - ], - "text/markdown": "The list you provided already includes all the athletes who competed in the Olympics exactly 9 times:\n\n* **Hubert Raudaschl**\n* **Afanasijs Kuzmins**\n* **Nino Salukvadze** \n" - }, - "metadata": {}, - "execution_count": 5 - } - ] - } - ] -} \ No newline at end of file From 775350e630c3bf9d3f424eb7607aea8fcceddd40 Mon Sep 17 00:00:00 2001 From: Shilpa Kancharla Date: Tue, 28 May 2024 13:30:35 -0700 Subject: [PATCH 3/6] Add entity extraction example --- .../Entity_Extraction_JSON.ipynb | 231 ++++++++++++++++++ 1 file changed, 231 insertions(+) create mode 100644 examples/json_capabilities/Entity_Extraction_JSON.ipynb diff --git a/examples/json_capabilities/Entity_Extraction_JSON.ipynb b/examples/json_capabilities/Entity_Extraction_JSON.ipynb new file mode 100644 index 000000000..3b1a93ed8 --- /dev/null +++ b/examples/json_capabilities/Entity_Extraction_JSON.ipynb @@ -0,0 +1,231 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "T47AX_Is2FjB" + }, + "source": [ + "##### Copyright 2024 Google LLC." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "M_vx0YO92qlR" + }, + "outputs": [], + "source": [ + "# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sP8PQnz1QrcF" + }, + "source": [ + "# Gemini API: Entity Extraction" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bxGr_x3MRA0z" + }, + "source": [ + "\n", + " \n", + "
\n", + " Run in Google Colab\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ysy--KfNRrCq" + }, + "source": [ + "You will use Gemini to extract all fields that fit one of the predefined classes and label them." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "Ne-3gnXqR0hI" + }, + "outputs": [], + "source": [ + "!pip install -U -q google-generativeai" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "EconMHePQHGw" + }, + "outputs": [], + "source": [ + "import google.generativeai as genai\n", + "\n", + "import json\n", + "from enum import Enum\n", + "from typing_extensions import TypedDict # in python 3.12 replace typing_extensions with typing" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eomJzCa6lb90" + }, + "source": [ + "## Configure your API key\n", + "\n", + "To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "v-JZzORUpVR2" + }, + "outputs": [], + "source": [ + "from google.colab import userdata\n", + "GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')\n", + "\n", + "genai.configure(api_key=GOOGLE_API_KEY)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "R3EUoLgJNfe7" + }, + "source": [ + "## Example" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "sShzxm3JNm6M" + }, + "outputs": [], + "source": [ + "class CategoryEnum(str, Enum):\n", + " Person = 'Person'\n", + " Company = 'Company'\n", + " State = 'State'\n", + " City = 'City'\n", + "\n", + "class Entity(TypedDict):\n", + " name: str\n", + " category: str\n", + "\n", + "class Entities(TypedDict):\n", + " entities: list[Entity]" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "QGdJnd0AOKbu" + }, + "outputs": [], + "source": [ + "entity_recongnition_text = \"Elon Musk, the CEO of Tesla and SpaceX, has unveiled plans to build a new factory in Austin, Texas. The factory will create 10,000 new jobs and produce electric vehicles.\"\n", + "prompt = f\"\"\"\n", + "Generate list of entities in text:\n", + "\n", + "{entity_recongnition_text}\"\"\"\n", + "model = genai.GenerativeModel(model_name='gemini-1.5-flash-latest', generation_config={\"temperature\": 0})\n", + "response = model.generate_content(prompt, generation_config={\"response_mime_type\": \"application/json\", \"response_schema\": Entities})" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "id": "d5tOgde6ONo3" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{\n", + " \"entities\": [\n", + " {\n", + " \"text\": \"Elon Musk\",\n", + " \"type\": \"PERSON\"\n", + " },\n", + " {\n", + " \"text\": \"Tesla\",\n", + " \"type\": \"ORG\"\n", + " },\n", + " {\n", + " \"text\": \"SpaceX\",\n", + " \"type\": \"ORG\"\n", + " },\n", + " {\n", + " \"text\": \"Austin\",\n", + " \"type\": \"GPE\"\n", + " },\n", + " {\n", + " \"text\": \"Texas\",\n", + " \"type\": \"GPE\"\n", + " }\n", + " ]\n", + "}\n" + ] + } + ], + "source": [ + "print(json.dumps(json.loads(response.text), indent=4))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2uv9Rikv27tf" + }, + "source": [ + "## Summary\n", + "You have used the Gemini API to extract entities of predifined categories with their labels. You extracted every person, company, state, and country. You are not limited to these categories, as this should work with any category of your choice.\n", + "\n", + "Please see the other notebooks in this directory to learn more about how you can use the Gemini API for other JSON related tasks.\n" + ] + } + ], + "metadata": { + "colab": { + "name": "Entity_Extraction_JSON.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} From cd99d8fbf77da8fead034d73dd423b023fb3bb70 Mon Sep 17 00:00:00 2001 From: Shilpa Kancharla Date: Tue, 28 May 2024 14:27:02 -0700 Subject: [PATCH 4/6] Colab link updated --- examples/json_capabilities/Entity_Extraction_JSON.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/json_capabilities/Entity_Extraction_JSON.ipynb b/examples/json_capabilities/Entity_Extraction_JSON.ipynb index 3b1a93ed8..e6045e64b 100644 --- a/examples/json_capabilities/Entity_Extraction_JSON.ipynb +++ b/examples/json_capabilities/Entity_Extraction_JSON.ipynb @@ -48,7 +48,7 @@ "source": [ "\n", " \n", "
\n", - " Run in Google Colab\n", + " Run in Google Colab\n", "
" ] From 9ab67256b7964938d2015fabee7402ad689a221d Mon Sep 17 00:00:00 2001 From: Shilpa Kancharla Date: Tue, 28 May 2024 14:55:05 -0700 Subject: [PATCH 5/6] Include Python classes in prompt to use 1.5 Flash --- .../Entity_Extraction_JSON.ipynb | 53 ++++++++----------- 1 file changed, 22 insertions(+), 31 deletions(-) diff --git a/examples/json_capabilities/Entity_Extraction_JSON.ipynb b/examples/json_capabilities/Entity_Extraction_JSON.ipynb index e6045e64b..3da707852 100644 --- a/examples/json_capabilities/Entity_Extraction_JSON.ipynb +++ b/examples/json_capabilities/Entity_Extraction_JSON.ipynb @@ -64,7 +64,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 8, "metadata": { "id": "Ne-3gnXqR0hI" }, @@ -75,7 +75,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 9, "metadata": { "id": "EconMHePQHGw" }, @@ -101,7 +101,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 10, "metadata": { "id": "v-JZzORUpVR2" }, @@ -124,12 +124,16 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 20, "metadata": { - "id": "sShzxm3JNm6M" + "id": "QGdJnd0AOKbu" }, "outputs": [], "source": [ + "entity_recongnition_text = \"John Johnson, the CEO of the Oil Inc. and Coal Inc. companies, has unveiled plans to build a new factory in Houston, Texas.\"\n", + "prompt = f\"\"\"\n", + "Generate list of entities in text based on the following Python class structure:\n", + "\n", "class CategoryEnum(str, Enum):\n", " Person = 'Person'\n", " Company = 'Company'\n", @@ -141,29 +145,16 @@ " category: str\n", "\n", "class Entities(TypedDict):\n", - " entities: list[Entity]" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": { - "id": "QGdJnd0AOKbu" - }, - "outputs": [], - "source": [ - "entity_recongnition_text = \"Elon Musk, the CEO of Tesla and SpaceX, has unveiled plans to build a new factory in Austin, Texas. The factory will create 10,000 new jobs and produce electric vehicles.\"\n", - "prompt = f\"\"\"\n", - "Generate list of entities in text:\n", + " entities: list[Entity]\n", "\n", "{entity_recongnition_text}\"\"\"\n", "model = genai.GenerativeModel(model_name='gemini-1.5-flash-latest', generation_config={\"temperature\": 0})\n", - "response = model.generate_content(prompt, generation_config={\"response_mime_type\": \"application/json\", \"response_schema\": Entities})" + "response = model.generate_content(prompt, generation_config={\"response_mime_type\": \"application/json\"})" ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 21, "metadata": { "id": "d5tOgde6ONo3" }, @@ -175,24 +166,24 @@ "{\n", " \"entities\": [\n", " {\n", - " \"text\": \"Elon Musk\",\n", - " \"type\": \"PERSON\"\n", + " \"name\": \"John Johnson\",\n", + " \"category\": \"Person\"\n", " },\n", " {\n", - " \"text\": \"Tesla\",\n", - " \"type\": \"ORG\"\n", + " \"name\": \"Oil Inc.\",\n", + " \"category\": \"Company\"\n", " },\n", " {\n", - " \"text\": \"SpaceX\",\n", - " \"type\": \"ORG\"\n", + " \"name\": \"Coal Inc.\",\n", + " \"category\": \"Company\"\n", " },\n", " {\n", - " \"text\": \"Austin\",\n", - " \"type\": \"GPE\"\n", + " \"name\": \"Houston\",\n", + " \"category\": \"City\"\n", " },\n", " {\n", - " \"text\": \"Texas\",\n", - " \"type\": \"GPE\"\n", + " \"name\": \"Texas\",\n", + " \"category\": \"State\"\n", " }\n", " ]\n", "}\n" From 1df93d94d0a4f9b005b7fc105dbdc1bdb48b8a81 Mon Sep 17 00:00:00 2001 From: Mark Daoust Date: Wed, 29 May 2024 11:14:29 -0700 Subject: [PATCH 6/6] Use the enum --- examples/json_capabilities/Entity_Extraction_JSON.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/json_capabilities/Entity_Extraction_JSON.ipynb b/examples/json_capabilities/Entity_Extraction_JSON.ipynb index 3da707852..a59e22167 100644 --- a/examples/json_capabilities/Entity_Extraction_JSON.ipynb +++ b/examples/json_capabilities/Entity_Extraction_JSON.ipynb @@ -142,7 +142,7 @@ "\n", "class Entity(TypedDict):\n", " name: str\n", - " category: str\n", + " category: CategoryEnum\n", "\n", "class Entities(TypedDict):\n", " entities: list[Entity]\n",