Update anthropic cookbook with response speed comparision (run-llama#…

…11923)
Izukimat · Mar 29, 2024 · 40e1d8f · 40e1d8f
1 parent e25187a
commit 40e1d8f
Showing 1 changed file with 135 additions and 18 deletions.
diff --git a/docs/cookbooks/anthropic_haiku.ipynb b/docs/cookbooks/anthropic_haiku.ipynb
@@ -85,21 +85,19 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "LlamaIndex is a Python library that provides a set of tools and interfaces for building knowledge-based applications using large language models (LLMs) like GPT-3, GPT-J, and GPT-Neo. It is designed to make it easier to work with LLMs by providing a high-level API for tasks such as document ingestion, query answering, and knowledge extraction.\n",
+      "LlamaIndex is an open-source library that provides a set of tools and interfaces for building knowledge-based applications using large language models (LLMs) like GPT-3, GPT-J, and GPT-Neo. It is designed to make it easier to work with LLMs by providing a high-level API for tasks such as:\n",
       "\n",
-      "Some key features of LlamaIndex include:\n",
+      "1. **Data Ingestion**: LlamaIndex supports ingesting a variety of data sources, including text files, PDFs, web pages, and databases, and organizing them into a knowledge graph.\n",
       "\n",
-      "1. **Document Ingestion**: LlamaIndex provides a simple interface for ingesting various types of documents (e.g., text files, PDFs, web pages) and storing them in a format that can be efficiently queried.\n",
+      "2. **Query Handling**: LlamaIndex provides a simple and intuitive interface for querying the knowledge graph, allowing users to ask questions and get relevant information from the underlying data.\n",
       "\n",
-      "2. **Query Answering**: LlamaIndex allows you to ask questions about the ingested documents and get relevant answers, leveraging the capabilities of the underlying LLM.\n",
+      "3. **Retrieval and Ranking**: LlamaIndex uses advanced retrieval and ranking algorithms to identify the most relevant information for a given query, leveraging the capabilities of the underlying LLM.\n",
       "\n",
-      "3. **Knowledge Extraction**: LlamaIndex can be used to extract structured knowledge from the ingested documents, such as entities, relationships, and facts.\n",
+      "4. **Summarization and Synthesis**: LlamaIndex can generate summaries and synthesize new information based on the content of the knowledge graph, helping users to quickly understand and extract insights from large amounts of data.\n",
       "\n",
-      "4. **Customization**: LlamaIndex is designed to be highly customizable, allowing you to use different LLMs, document storage backends, and query strategies to suit your specific needs.\n",
+      "5. **Extensibility**: LlamaIndex is designed to be highly extensible, allowing developers to integrate custom data sources, retrieval algorithms, and other functionality as needed.\n",
       "\n",
-      "5. **Scalability**: LlamaIndex is designed to work with large amounts of data, making it suitable for building knowledge-intensive applications.\n",
-      "\n",
-      "LlamaIndex is particularly useful for building applications that require accessing and reasoning over large amounts of structured and unstructured data, such as question-answering systems, chatbots, and knowledge management tools. It provides a flexible and extensible framework for integrating LLMs into your application, making it easier to leverage the power of these models without having to deal with the low-level details of working with them.\n"
+      "The primary goal of LlamaIndex is to make it easier for developers to build knowledge-based applications that leverage the power of large language models, without having to worry about the low-level details of working with these models directly. By providing a high-level API and a set of reusable components, LlamaIndex aims to accelerate the development of a wide range of applications, from chatbots and virtual assistants to knowledge management systems and research tools.\n"
      ]
     }
    ],
@@ -133,16 +131,16 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "--2024-03-14 03:06:48--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/images/prometheus_paper_card.png\n",
+      "--2024-03-14 03:27:01--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/images/prometheus_paper_card.png\n",
       "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8000::154, 2606:50c0:8001::154, 2606:50c0:8002::154, ...\n",
       "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8000::154|:443... connected.\n",
       "HTTP request sent, awaiting response... 200 OK\n",
       "Length: 1002436 (979K) [image/png]\n",
       "Saving to: 'prometheus_paper_card.png'\n",
       "\n",
-      "prometheus_paper_ca 100%[===================>] 978.94K  --.-KB/s    in 0.1s    \n",
+      "prometheus_paper_ca 100%[===================>] 978.94K  --.-KB/s    in 0.07s   \n",
       "\n",
-      "2024-03-14 03:06:49 (9.45 MB/s) - 'prometheus_paper_card.png' saved [1002436/1002436]\n",
+      "2024-03-14 03:27:01 (13.3 MB/s) - 'prometheus_paper_card.png' saved [1002436/1002436]\n",
       "\n"
      ]
     }
@@ -159,7 +157,7 @@
     {
      "data": {
       "text/plain": [
-       "<matplotlib.image.AxesImage at 0x15f0d9150>"
+       "<matplotlib.image.AxesImage at 0x167e83290>"
       ]
      },
      "execution_count": null,
@@ -233,13 +231,11 @@
       "\n",
       "The poster is divided into three main sections: Contributions, Results, and Technical Bits.\n",
       "\n",
-      "The Contributions section introduces Prometheus as an open-source LLM evaluator that specializes in fine-grained evaluations using custom rubrics.\n",
-      "\n",
-      "The Results section highlights three key findings: 1) Prometheus matches or outperforms GPT-4 on three datasets, 2) Prometheus can function as a reward model, achieving high levels of agreement with human evaluators, and 3) Reference answers are crucial for LLM evaluations, as excluding them and then using feedback distillation led to performance degradations against other considered factors.\n",
+      "The Contributions section introduces Prometheus as an open-source LLM evaluator that uses custom rubrics for fine-grained evaluations. The Feedback Collection section describes a dataset designed for fine-tuning evaluator LLMs with custom, fine-grained score rubrics.\n",
       "\n",
-      "The Technical Bits section provides a visual representation of the Feedback Collection process, which involves using GPT-4 and GPT-4 to generate score rubrics and instructions, which are then used to evaluate the input and output of the language model.\n",
+      "The Results section highlights three key findings: 1) Prometheus matches or outperforms GPT-4 on three datasets, and its written feedback was preferred over GPT-4 by human annotators 58.6% of the time; 2) Prometheus can function as a reward model, achieving high levels of agreement with human evaluators when re-purposed for ranking/grading tasks; and 3) reference answers are crucial for LLM evaluations, as excluding them and then using feedback distillation led to performance degradations against all other considered factors.\n",
       "\n",
-      "The poster also includes various icons and symbols, as well as a flame icon, which appears to be a visual representation of the Prometheus project.\n"
+      "The Technical Bits section provides a visual overview of the Feedback Collection process, which involves using GPT-4 to generate score rubrics and\n"
      ]
     }
    ],
@@ -251,6 +247,127 @@
     "\n",
     "print(response)"
    ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Let's compare speed of the responses from different models\n",
+    "\n",
+    "We will randomly generate 10 prompts and check the average response time."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "##### Generate random 10 prompts"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import random\n",
+    "\n",
+    "# Lists of potential subjects and actions\n",
+    "subjects = [\"a cat\", \"an astronaut\", \"a teacher\", \"a robot\", \"a pirate\"]\n",
+    "actions = [\n",
+    "    \"is exploring a mysterious cave\",\n",
+    "    \"finds a hidden treasure\",\n",
+    "    \"solves a complex puzzle\",\n",
+    "    \"invents a new gadget\",\n",
+    "    \"discovers a new planet\",\n",
+    "]\n",
+    "\n",
+    "prompts = []\n",
+    "# Generating 10 random prompts\n",
+    "for _ in range(10):\n",
+    "    subject = random.choice(subjects)\n",
+    "    action = random.choice(actions)\n",
+    "    prompt = f\"{subject} {action}\"\n",
+    "    prompts.append(prompt)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "\n",
+    "\n",
+    "# Computes average response time for model and prompts\n",
+    "def average_response_time(model, prompts):\n",
+    "    total_time_taken = 0\n",
+    "    llm = Anthropic(model=model, max_tokens=300)\n",
+    "    for prompt in prompts:\n",
+    "        start_time = time.time()\n",
+    "        _ = llm.complete(prompt)\n",
+    "        end_time = time.time()\n",
+    "        total_time_taken = total_time_taken + end_time - start_time\n",
+    "\n",
+    "    return total_time_taken / len(prompts)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "haiku_avg_response_time = average_response_time(\n",
+    "    \"claude-3-haiku-20240307\", prompts\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "opus_avg_response_time = average_response_time(\n",
+    "    \"claude-3-opus-20240229\", prompts\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sonnet_avg_response_time = average_response_time(\n",
+    "    \"claude-3-sonnet-20240229\", prompts\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Avg. time taken by Haiku model: 3.87667396068573 seconds\n",
+      "Avg. time taken by Opus model: 18.772309136390685 seconds\n",
+      "Avg. time taken by Sonnet model: 47.86884641647339 seconds\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(f\"Avg. time taken by Haiku model: {haiku_avg_response_time} seconds\")\n",
+    "print(f\"Avg. time taken by Opus model: {opus_avg_response_time} seconds\")\n",
+    "print(f\"Avg. time taken by Sonnet model: {sonnet_avg_response_time} seconds\")"
+   ]
   }
  ],
  "metadata": {