-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
253 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
{ | ||
"nbformat": 4, | ||
"nbformat_minor": 0, | ||
"metadata": { | ||
"colab": { | ||
"provenance": [] | ||
}, | ||
"kernelspec": { | ||
"name": "python3", | ||
"display_name": "Python 3" | ||
}, | ||
"language_info": { | ||
"name": "python" | ||
} | ||
}, | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"# Use banks to cache prompts with Anthropic API\n", | ||
"\n", | ||
"Prompt caching allows you to store and reuse context within your prompt saving time and money. When using the prompt cache feature from Anthropic, the chat messages have to be expressed in blocks rather than simple text, so that for each block you can define the cache behaviour.\n", | ||
"\n", | ||
"Let's see how Banks makes this super easy." | ||
], | ||
"metadata": { | ||
"id": "JPUfjAlRUB8w" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "4w6N2F8gGF7q" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"!pip install banks" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"To simulate a huge prompt, we'll provide Claude with a full book in the context, \"Pride and prejudice\"." | ||
], | ||
"metadata": { | ||
"id": "QF9UZVjaUsK1" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"!curl -O https://www.gutenberg.org/cache/epub/1342/pg1342.txt" | ||
], | ||
"metadata": { | ||
"id": "Ayno0BHEStAm" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"with open(\"pg1342.txt\") as f:\n", | ||
" book = f.read()" | ||
], | ||
"metadata": { | ||
"id": "N2EcJ1P6Svx6" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"With Banks we can define which part of the prompt will be cached from the prompt template directly, using the `cache_control` built-in filter." | ||
], | ||
"metadata": { | ||
"id": "xzTdJJubVGkL" | ||
} | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"import time\n", | ||
"\n", | ||
"import litellm\n", | ||
"from litellm import completion\n", | ||
"\n", | ||
"from banks import Prompt\n", | ||
"\n", | ||
"\n", | ||
"tpl = \"\"\"\n", | ||
"{% chat role=\"user\" %}\n", | ||
"Analyze this book:\n", | ||
"\n", | ||
"{# Only this part of the message content (the book content) will be cached #}\n", | ||
"{{ book | cache_control(\"ephemeral\") }}\n", | ||
"\n", | ||
"What is the title of this book? Only output the title.\n", | ||
"{% endchat %}\n", | ||
"\"\"\"\n", | ||
"\n", | ||
"p = Prompt(tpl)\n", | ||
"# render the prompt in form of a list of Banks' ChatMessage\n", | ||
"chat_messages = p.chat_messages({\"book\": book})\n", | ||
"# dump the ChatMessage objects into dictionaries to pass to LiteLLM\n", | ||
"messages_dict = [m.model_dump(exclude_none=True) for m in chat_messages]" | ||
], | ||
"metadata": { | ||
"id": "7PO4397MSm-f" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"# First call has no cache\n", | ||
"start_time = time.time()\n", | ||
"response = completion(model=\"anthropic/claude-3-5-sonnet-20240620\", messages=messages_dict)\n", | ||
"\n", | ||
"print(f\"Non-cached API call time: {time.time() - start_time:.2f} seconds\")\n", | ||
"print(response.usage)\n", | ||
"print(response)" | ||
], | ||
"metadata": { | ||
"id": "3hsJHr29ThLj" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"# Second call, the book is cached\n", | ||
"start_time = time.time()\n", | ||
"response = completion(model=\"anthropic/claude-3-5-sonnet-20240620\", messages=messages_dict)\n", | ||
"\n", | ||
"print(f\"Cached API call time: {time.time() - start_time:.2f} seconds\")\n", | ||
"print(response.usage)\n", | ||
"print(response)" | ||
], | ||
"metadata": { | ||
"id": "8F75jH4BTZ6U" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
{ | ||
"nbformat": 4, | ||
"nbformat_minor": 0, | ||
"metadata": { | ||
"colab": { | ||
"provenance": [] | ||
}, | ||
"kernelspec": { | ||
"name": "python3", | ||
"display_name": "Python 3" | ||
}, | ||
"language_info": { | ||
"name": "python" | ||
} | ||
}, | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"id": "LTucJvi7Xor_" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"!pip install banks" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"import os\n", | ||
"\n", | ||
"os.mkdir(\"templates\")" | ||
], | ||
"metadata": { | ||
"id": "iCSH4kOaczt5" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"from pathlib import Path\n", | ||
"\n", | ||
"from banks import Prompt\n", | ||
"from banks.registries import DirectoryTemplateRegistry\n", | ||
"\n", | ||
"# Tell the registry where to store the prompt texts\n", | ||
"registry = DirectoryTemplateRegistry(Path(\".\") / \"templates\")\n", | ||
"\n", | ||
"# Write two versions of the same prompt, optimized for different LLMs\n", | ||
"blog_prompt_gpt = Prompt(\"Write a 500-word blog post on {{ topic }}.\\n\\nBlog post:\")\n", | ||
"blog_prompt_llama3 = Prompt(\n", | ||
" \"Write a blog post abot the topic {{ topic }}. Do not write more than 500 words\"\n", | ||
" \"Examples:\"\n", | ||
" \"{% for example in examples %}\"\n", | ||
" \"{{ example }}\"\n", | ||
" \"{% endfor %}\"\n", | ||
" \"\\n\\nBlog post:\"\n", | ||
")\n", | ||
"\n", | ||
"# Store the two versions under the same name, using the `version` property to\n", | ||
"# tell them apart.\n", | ||
"registry.set(name=\"blog_prompt\", prompt=blog_prompt_gpt, version=\"gpt-3.5-turbo\")\n", | ||
"registry.set(name=\"blog_prompt\", prompt=blog_prompt_llama3, version=\"ollama/llama3.1:8b\")" | ||
], | ||
"metadata": { | ||
"id": "UaSSFjnUXzMD" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"source": [ | ||
"import os\n", | ||
"\n", | ||
"from litellm import completion\n", | ||
"from banks.registries import DirectoryTemplateRegistry\n", | ||
"\n", | ||
"\n", | ||
"## set ENV variables\n", | ||
"os.environ[\"OPENAI_API_KEY\"] = \"your-api-key\"\n", | ||
"\n", | ||
"# Tell the registry where to store the prompt texts\n", | ||
"registry = DirectoryTemplateRegistry(Path(\".\") / \"templates\")\n", | ||
"\n", | ||
"\n", | ||
"response = completion(\n", | ||
" model=\"gpt-3.5-turbo\",\n", | ||
" messages=[{ \"content\": registry.get(name=\"blog_prompt\", version=\"gpt-3.5-turbo\").text(), \"role\": \"user\"}]\n", | ||
")" | ||
], | ||
"metadata": { | ||
"id": "IyVpMFN7dAhW" | ||
}, | ||
"execution_count": null, | ||
"outputs": [] | ||
} | ||
] | ||
} |