diff --git a/examples/Enhance_your_prompts_with_meta_prompting.ipynb b/examples/Enhance_your_prompts_with_meta_prompting.ipynb new file mode 100644 index 0000000000..660fd679f2 --- /dev/null +++ b/examples/Enhance_your_prompts_with_meta_prompting.ipynb @@ -0,0 +1,953 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Meta Prompting: A Guide to Automated Prompt Optimization\n", + "\n", + "Welcome to our cookbook on meta prompting! In this guide, we'll explore how to take a basic prompt and refine it to enhance the quality of outputs from a language model. We'll use the example of summarizing news reports to illustrate the process.\n", + "\n", + "\n", + "Meta-prompting is a technique where you use an LLM to generate or improve prompts. Typically this is done using a higher intelligent model that optimizes prompts for a model with less intelligence. It’s a process of using prompts to guide, structure, and optimize other prompts, helping ensure they’re more effective in guiding the LLM towards high-quality, relevant outputs. We'll be leveraging the capabilities of `o1-preview`, a more intelligent model with advanced reasoning skills, to improve a prompt for `gpt-4o`.\n", + "\n", + "We're committed to making your development journey with LLMs smoother and more accessible through this technique. Don't forget to check out our [Generate Anything](https://platform.openai.com/docs/guides/prompt-generation) feature in the playground — it's a fantastic starting point to dive into meta prompting.\n", + "\n", + "In this example, we'll begin with a simple prompt for summarizing news articles and then enhance it to see how the outputs improve. We'll use `o1-preview` to analyze and refine our prompt, adding more detail and clarity along the way. Finally, we'll evaluate the outputs systematically to understand the impact of our refinements." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", + " from .autonotebook import tqdm as notebook_tqdm\n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "import openai \n", + "from concurrent.futures import ThreadPoolExecutor, as_completed\n", + "from tqdm import tqdm\n", + "from pydantic import BaseModel\n", + "from datasets import load_dataset\n", + "\n", + "client = openai.Client()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Importing the Data\n", + "\n", + "Let's kick things off by importing the `bbc_news_alltime` dataset from [HuggingFace](https://huggingface.co/datasets/RealTimeData/bbc_news_alltime). This dataset contains all BBC News articles, capturing everything published monthly from 2017 up to the latest complete month. For our experiment, we'll focus exclusively on a sample from a recent month—August 2024—to keep things current and manageable.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titlepublished_dateauthorsdescriptionsectioncontentlinktop_image
2662Laura Whitmore: I was gaslighted after raising...2024-08-04https://www.facebook.com/bbcnewsThe former Love Island host said that things s...CultureTelevision presenter Laura Whitmore has said t...http://www.bbc.co.uk/news/articles/c9wvwvzm7x7ohttps://ichef.bbci.co.uk/ace/standard/2560/cps...
1865Errollyn Wallen appointed as Master of the Kin...2024-08-25https://www.facebook.com/bbcnewsShe is best known for her work on the 2012 Par...CultureCelebrated composer and singer-songwriter Erro...http://www.bbc.co.uk/news/articles/c4gl758g7zgohttps://ichef.bbci.co.uk/ace/standard/2560/cps...
2554SDLP: Matthew O'Toole endorses Claire Hanna fo...2024-08-30https://www.facebook.com/bbcnewsMatthew O'Toole had been named by some as a po...Northern Ireland PoliticsMatthew O'Toole leads his party's official opp...http://www.bbc.co.uk/news/articles/cvg41j7xrzdohttps://ichef.bbci.co.uk/ace/standard/3840/cps...
1338Rotherham rioters among those jailed - BBC News2024-08-20https://www.facebook.com/bbcnewsTwo men who were part of a mob targeting a Hol...South YorkshireRotherham pair among those jailed for UK rioti...http://www.bbc.co.uk/news/articles/cwywggd7qw6ohttps://ichef.bbci.co.uk/ace/standard/2560/cps...
1232BBC News - BBC iPlayer2024-08-02NoneNoneNoneJavaScript seems to be disabled. Please enable...http://www.bbc.co.uk/news/10318089
\n", + "
" + ], + "text/plain": [ + " title published_date \\\n", + "2662 Laura Whitmore: I was gaslighted after raising... 2024-08-04 \n", + "1865 Errollyn Wallen appointed as Master of the Kin... 2024-08-25 \n", + "2554 SDLP: Matthew O'Toole endorses Claire Hanna fo... 2024-08-30 \n", + "1338 Rotherham rioters among those jailed - BBC News 2024-08-20 \n", + "1232 BBC News - BBC iPlayer 2024-08-02 \n", + "\n", + " authors \\\n", + "2662 https://www.facebook.com/bbcnews \n", + "1865 https://www.facebook.com/bbcnews \n", + "2554 https://www.facebook.com/bbcnews \n", + "1338 https://www.facebook.com/bbcnews \n", + "1232 None \n", + "\n", + " description \\\n", + "2662 The former Love Island host said that things s... \n", + "1865 She is best known for her work on the 2012 Par... \n", + "2554 Matthew O'Toole had been named by some as a po... \n", + "1338 Two men who were part of a mob targeting a Hol... \n", + "1232 None \n", + "\n", + " section \\\n", + "2662 Culture \n", + "1865 Culture \n", + "2554 Northern Ireland Politics \n", + "1338 South Yorkshire \n", + "1232 None \n", + "\n", + " content \\\n", + "2662 Television presenter Laura Whitmore has said t... \n", + "1865 Celebrated composer and singer-songwriter Erro... \n", + "2554 Matthew O'Toole leads his party's official opp... \n", + "1338 Rotherham pair among those jailed for UK rioti... \n", + "1232 JavaScript seems to be disabled. Please enable... \n", + "\n", + " link \\\n", + "2662 http://www.bbc.co.uk/news/articles/c9wvwvzm7x7o \n", + "1865 http://www.bbc.co.uk/news/articles/c4gl758g7zgo \n", + "2554 http://www.bbc.co.uk/news/articles/cvg41j7xrzdo \n", + "1338 http://www.bbc.co.uk/news/articles/cwywggd7qw6o \n", + "1232 http://www.bbc.co.uk/news/10318089 \n", + "\n", + " top_image \n", + "2662 https://ichef.bbci.co.uk/ace/standard/2560/cps... \n", + "1865 https://ichef.bbci.co.uk/ace/standard/2560/cps... \n", + "2554 https://ichef.bbci.co.uk/ace/standard/3840/cps... \n", + "1338 https://ichef.bbci.co.uk/ace/standard/2560/cps... \n", + "1232 " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ds = load_dataset(\"RealTimeData/bbc_news_alltime\", \"2024-08\")\n", + "df = pd.DataFrame(ds['train']).sample(n=100, random_state=1)\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Iterating on Prompts\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's start with a straightforward prompt and then use `o1-preview` to enhance it for better results. We want to summarize news articles, so this is what i'll ask the model to do. " + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "simple_prompt = \"Summarize this news article: {article}\"\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To improve the prompt, we need to provide `o1-preview` with the context and goals we want to achieve. We can then ask it to generate a more detailed prompt that would produce richer and more comprehensive news summaries." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "meta_prompt = \"\"\"\n", + "Improve the following prompt to generate a more detailed sumary. \n", + "Adhere to prompt engineering best practices. \n", + "Make sure the structure is clear and intuitive and contains the type of news, tags and sentiment analysis.\n", + "\n", + "{simple_prompt}\n", + "\n", + "Only return the prompt.\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'Please read the following news article and provide a comprehensive summary that includes:\\n\\n1. **Type of News**: Specify the category of the news article (e.g., Politics, Technology, Health, Sports, etc.).\\n2. **Summary**: Write a concise and clear summary of the main points, ensuring the structure is logical and intuitive.\\n3. **Tags**: List relevant keywords or tags associated with the article.\\n4. **Sentiment Analysis**: Analyze the overall sentiment of the article (positive, negative, or neutral) and briefly explain your reasoning.\\n\\n**Article:**\\n\\n{article}'" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def get_model_response(messages, model=\"o1-preview\"):\n", + " response = client.chat.completions.create(\n", + " messages=messages,\n", + " model=model,\n", + " )\n", + " return response.choices[0].message.content\n", + "\n", + "\n", + "complex_prompt = get_model_response([{\"role\": \"user\", \"content\": meta_prompt.format(simple_prompt=simple_prompt)}])\n", + "complex_prompt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Generating the Summaries\n", + "\n", + "Now that we have both prompts, let's generate the summaries! For each entry in our dataset, we'll use both the simple and the enhanced prompts to see how they compare. By doing this, we'll get a firsthand look at how our refinements with `o1-preview` can lead to richer and more detailed summaries. Let's dive in and see the difference for ourselves!" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "def generate_resonse(prompt): \n", + " messages = [{\"role\": \"user\", \"content\": prompt}]\n", + " response = get_model_response(messages, model=\"gpt-4o-mini\")\n", + " return response\n", + "\n", + "def generate_summaries(row):\n", + " simple_itinerary = generate_resonse(simple_prompt.format(article=row[\"content\"]))\n", + " complex_itinerary = generate_resonse(complex_prompt + row[\"content\"])\n", + " return simple_itinerary, complex_itinerary" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's check if everything looks good and if we can generate a summary for the first news report. " + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "('Television presenter Laura Whitmore has shared that the issues she attempted to address during her time on *Strictly Come Dancing* eight years ago are now surfacing, stating that she experienced \"gaslighting\" that made her concerns seem normalized. In a recent interview, she expressed the difficulties she faced, including being portrayed negatively and feeling \"broken\" during the competition. Whitmore indicated that she raised concerns about inappropriate behavior and is currently providing evidence for a BBC investigation, although she has not made an official complaint herself. The BBC is facing allegations of mistreatment towards contestants, prompting them to announce new welfare measures, including the presence of a chaperone during rehearsals. Other celebrities participating in the show have also made allegations against professional dancers, leading to growing scrutiny around conditions on the show. The BBC emphasized that it takes complaints very seriously and is committed to updating its support processes.',\n", + " '1. **Type of News**: Entertainment\\n\\n2. **Summary**: Laura Whitmore, a television presenter, has spoken out about her experiences on Strictly Come Dancing, revealing that issues she attempted to address during her tenure on the show are now coming to light. In an interview with The Irish Times, she described feeling \"gaslit\" and suggested that her concerns, which she raised eight years ago, were not taken seriously at the time. Whitmore recalled that her participation left her feeling \"broken\" and criticized how she was portrayed during the show. She mentioned contributing evidence to an ongoing review involving incidents of alleged inappropriate behavior during her time on the show, although she did not make an official complaint. The BBC, which has been navigating its own controversy related to the treatment of contestants, stated it is taking these claims seriously and plans to enhance welfare measures on the show, including the introduction of a chaperone at rehearsals. Recent allegations from other contestants have further intensified the scrutiny of Strictly Come Dancing.\\n\\n3. **Tags**: Laura Whitmore, Strictly Come Dancing, BBC, allegations, inappropriate behavior, gaslighting, welfare measures, entertainment controversy\\n\\n4. **Sentiment Analysis**: The overall sentiment of the article is negative. It highlights serious allegations of mistreatment and inappropriate behavior associated with a popular television show, along with personal accounts from Whitmore that reflect emotional distress and professional struggles. The tone conveys a sense of urgency and seriousness regarding the issues raised, indicating a critical atmosphere within the entertainment industry related to contestant treatment.')" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "generate_summaries(df.iloc[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "By comparing the summaries generated from the simple and enhanced prompts, we can already see significant improvements. The initial summary gives us a general overview of the article, whereas the enhanced summary dives deeper — it not only provides a detailed summary but also categorizes the news type, lists relevant tags, and even includes a sentiment analysis.\n", + "\n", + "Let's test on the entire dataset now! " + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Generating Itineraries: 100%|██████████| 100/100 [00:50<00:00, 1.98it/s]\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titlepublished_dateauthorsdescriptionsectioncontentlinktop_imagesimple_summarycomplex_summary
2662Laura Whitmore: I was gaslighted after raising...2024-08-04https://www.facebook.com/bbcnewsThe former Love Island host said that things s...CultureTelevision presenter Laura Whitmore has said t...http://www.bbc.co.uk/news/articles/c9wvwvzm7x7ohttps://ichef.bbci.co.uk/ace/standard/2560/cps...Television presenter Laura Whitmore has spoken...1. **Type of News**: Entertainment/Television\\...
1865Errollyn Wallen appointed as Master of the Kin...2024-08-25https://www.facebook.com/bbcnewsShe is best known for her work on the 2012 Par...CultureCelebrated composer and singer-songwriter Erro...http://www.bbc.co.uk/news/articles/c4gl758g7zgohttps://ichef.bbci.co.uk/ace/standard/2560/cps...Errollyn Wallen has been appointed Master of t...1. **Type of News**: Arts/Music\\n\\n2. **Summar...
2554SDLP: Matthew O'Toole endorses Claire Hanna fo...2024-08-30https://www.facebook.com/bbcnewsMatthew O'Toole had been named by some as a po...Northern Ireland PoliticsMatthew O'Toole leads his party's official opp...http://www.bbc.co.uk/news/articles/cvg41j7xrzdohttps://ichef.bbci.co.uk/ace/standard/3840/cps...Matthew O'Toole, the leader of the official op...1. **Type of News**: Politics\\n\\n2. **Summary*...
1338Rotherham rioters among those jailed - BBC News2024-08-20https://www.facebook.com/bbcnewsTwo men who were part of a mob targeting a Hol...South YorkshireRotherham pair among those jailed for UK rioti...http://www.bbc.co.uk/news/articles/cwywggd7qw6ohttps://ichef.bbci.co.uk/ace/standard/2560/cps...Two men, Nathan Palmer (29) and Niven Matthewm...1. **Type of News**: Politics / Crime and Just...
1232BBC News - BBC iPlayer2024-08-02NoneNoneNoneJavaScript seems to be disabled. Please enable...http://www.bbc.co.uk/news/10318089The article discusses the need to enable JavaS...I cannot provide a summary of the article as t...
\n", + "
" + ], + "text/plain": [ + " title published_date \\\n", + "2662 Laura Whitmore: I was gaslighted after raising... 2024-08-04 \n", + "1865 Errollyn Wallen appointed as Master of the Kin... 2024-08-25 \n", + "2554 SDLP: Matthew O'Toole endorses Claire Hanna fo... 2024-08-30 \n", + "1338 Rotherham rioters among those jailed - BBC News 2024-08-20 \n", + "1232 BBC News - BBC iPlayer 2024-08-02 \n", + "\n", + " authors \\\n", + "2662 https://www.facebook.com/bbcnews \n", + "1865 https://www.facebook.com/bbcnews \n", + "2554 https://www.facebook.com/bbcnews \n", + "1338 https://www.facebook.com/bbcnews \n", + "1232 None \n", + "\n", + " description \\\n", + "2662 The former Love Island host said that things s... \n", + "1865 She is best known for her work on the 2012 Par... \n", + "2554 Matthew O'Toole had been named by some as a po... \n", + "1338 Two men who were part of a mob targeting a Hol... \n", + "1232 None \n", + "\n", + " section \\\n", + "2662 Culture \n", + "1865 Culture \n", + "2554 Northern Ireland Politics \n", + "1338 South Yorkshire \n", + "1232 None \n", + "\n", + " content \\\n", + "2662 Television presenter Laura Whitmore has said t... \n", + "1865 Celebrated composer and singer-songwriter Erro... \n", + "2554 Matthew O'Toole leads his party's official opp... \n", + "1338 Rotherham pair among those jailed for UK rioti... \n", + "1232 JavaScript seems to be disabled. Please enable... \n", + "\n", + " link \\\n", + "2662 http://www.bbc.co.uk/news/articles/c9wvwvzm7x7o \n", + "1865 http://www.bbc.co.uk/news/articles/c4gl758g7zgo \n", + "2554 http://www.bbc.co.uk/news/articles/cvg41j7xrzdo \n", + "1338 http://www.bbc.co.uk/news/articles/cwywggd7qw6o \n", + "1232 http://www.bbc.co.uk/news/10318089 \n", + "\n", + " top_image \\\n", + "2662 https://ichef.bbci.co.uk/ace/standard/2560/cps... \n", + "1865 https://ichef.bbci.co.uk/ace/standard/2560/cps... \n", + "2554 https://ichef.bbci.co.uk/ace/standard/3840/cps... \n", + "1338 https://ichef.bbci.co.uk/ace/standard/2560/cps... \n", + "1232 \n", + "\n", + " simple_summary \\\n", + "2662 Television presenter Laura Whitmore has spoken... \n", + "1865 Errollyn Wallen has been appointed Master of t... \n", + "2554 Matthew O'Toole, the leader of the official op... \n", + "1338 Two men, Nathan Palmer (29) and Niven Matthewm... \n", + "1232 The article discusses the need to enable JavaS... \n", + "\n", + " complex_summary \n", + "2662 1. **Type of News**: Entertainment/Television\\... \n", + "1865 1. **Type of News**: Arts/Music\\n\\n2. **Summar... \n", + "2554 1. **Type of News**: Politics\\n\\n2. **Summary*... \n", + "1338 1. **Type of News**: Politics / Crime and Just... \n", + "1232 I cannot provide a summary of the article as t... " + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Add new columns to the dataframe for storing itineraries\n", + "df['simple_summary'] = None\n", + "df['complex_summary'] = None\n", + "\n", + "# Use ThreadPoolExecutor to generate itineraries concurrently\n", + "with ThreadPoolExecutor() as executor:\n", + " futures = {executor.submit(generate_summaries, row): index for index, row in df.iterrows()}\n", + " for future in tqdm(as_completed(futures), total=len(futures), desc=\"Generating Itineraries\"):\n", + " index = futures[future]\n", + " simple_itinerary, complex_itinerary = future.result()\n", + " df.at[index, 'simple_summary'] = simple_itinerary\n", + " df.at[index, 'complex_summary'] = complex_itinerary\n", + "\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Evaluating the Results\n", + "\n", + "To assess the difference in performance between the two prompts, we'll use a structured evaluation approach with the LLM acting as a judge. This means we'll leverage the language model itself to evaluate and compare the outputs based on specific criteria.\n", + "\n", + "**What Does \"LLM as a Judge\" Mean?**\n", + "\n", + "Using an LLM as a judge involves having the language model evaluate its own outputs or those of another model. It applies predefined criteria to assess aspects like accuracy, clarity, and relevance. This approach helps us obtain an objective and consistent evaluation without human bias, making it easier to identify improvements between different prompts. Our cookbook on [Getting Started with OpenAI Evals](https://cookbook.openai.com/examples/evaluation/getting_started_with_openai_evals) offers a glimps on how you can get started with this approach.\n", + "\n", + "\n", + "Here's the prompt we'll use for evaluation:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "evaluation_prompt = \"\"\"\n", + "You are an expert editor tasked with evaluating the quality of a news article summary. Below is the original article and the summary to be evaluated:\n", + "\n", + "**Original Article**: \n", + "{original_article}\n", + "\n", + "**Summary**: \n", + "{summary}\n", + "\n", + "Please evaluate the summary based on the following criteria, using a scale of 1 to 5 (1 being the lowest and 5 being the highest). Be critical in your evaluation and only give high scores for exceptional summaries:\n", + "\n", + "1. **Categorization and Context**: Does the summary clearly identify the type or category of news (e.g., Politics, Technology, Sports) and provide appropriate context? \n", + "2. **Keyword and Tag Extraction**: Does the summary include relevant keywords or tags that accurately capture the main topics and themes of the article? \n", + "3. **Sentiment Analysis**: Does the summary accurately identify the overall sentiment of the article and provide a clear, well-supported explanation for this sentiment? \n", + "4. **Clarity and Structure**: Is the summary clear, well-organized, and structured in a way that makes it easy to understand the main points? \n", + "5. **Detail and Completeness**: Does the summary provide a detailed account that includes all necessary components (type of news, tags, sentiment) comprehensively? \n", + "\n", + "\n", + "Provide your scores and justifications for each criterion, ensuring a rigorous and detailed evaluation.\n", + "\"\"\"\n", + "\n", + "class ScoreCard(BaseModel):\n", + " categorization: int\n", + " keyword_extraction: int\n", + " sentiment_analysis: int\n", + " clarity_structure: int\n", + " detail_completeness: int\n", + " justification: str" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here's a pro tip — you can actually use meta prompting to refine your evaluation prompt as well! By applying the same iterative enhancement to the prompt that instructs the LLM to act as a judge, you can make your evaluations even more precise and insightful. \n", + "\n", + "Let's use this prompt to evaluate our summaries!" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Evaluating Summaries: 100%|██████████| 100/100 [01:42<00:00, 1.02s/it]\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
titlepublished_dateauthorsdescriptionsectioncontentlinktop_imagesimple_summarycomplex_summarysimple_evaluationcomplex_evaluation
2662Laura Whitmore: I was gaslighted after raising...2024-08-04https://www.facebook.com/bbcnewsThe former Love Island host said that things s...CultureTelevision presenter Laura Whitmore has said t...http://www.bbc.co.uk/news/articles/c9wvwvzm7x7ohttps://ichef.bbci.co.uk/ace/standard/2560/cps...Television presenter Laura Whitmore has spoken...1. **Type of News**: Entertainment/Television\\...categorization=4 keyword_extraction=3 sentimen...categorization=5 keyword_extraction=5 sentimen...
1865Errollyn Wallen appointed as Master of the Kin...2024-08-25https://www.facebook.com/bbcnewsShe is best known for her work on the 2012 Par...CultureCelebrated composer and singer-songwriter Erro...http://www.bbc.co.uk/news/articles/c4gl758g7zgohttps://ichef.bbci.co.uk/ace/standard/2560/cps...Errollyn Wallen has been appointed Master of t...1. **Type of News**: Arts/Music\\n\\n2. **Summar...categorization=4 keyword_extraction=4 sentimen...categorization=5 keyword_extraction=5 sentimen...
2554SDLP: Matthew O'Toole endorses Claire Hanna fo...2024-08-30https://www.facebook.com/bbcnewsMatthew O'Toole had been named by some as a po...Northern Ireland PoliticsMatthew O'Toole leads his party's official opp...http://www.bbc.co.uk/news/articles/cvg41j7xrzdohttps://ichef.bbci.co.uk/ace/standard/3840/cps...Matthew O'Toole, the leader of the official op...1. **Type of News**: Politics\\n\\n2. **Summary*...categorization=5 keyword_extraction=4 sentimen...categorization=5 keyword_extraction=5 sentimen...
1338Rotherham rioters among those jailed - BBC News2024-08-20https://www.facebook.com/bbcnewsTwo men who were part of a mob targeting a Hol...South YorkshireRotherham pair among those jailed for UK rioti...http://www.bbc.co.uk/news/articles/cwywggd7qw6ohttps://ichef.bbci.co.uk/ace/standard/2560/cps...Two men, Nathan Palmer (29) and Niven Matthewm...1. **Type of News**: Politics / Crime and Just...categorization=3 keyword_extraction=3 sentimen...categorization=5 keyword_extraction=4 sentimen...
1232BBC News - BBC iPlayer2024-08-02NoneNoneNoneJavaScript seems to be disabled. Please enable...http://www.bbc.co.uk/news/10318089The article discusses the need to enable JavaS...I cannot provide a summary of the article as t...categorization=2 keyword_extraction=3 sentimen...categorization=1 keyword_extraction=1 sentimen...
\n", + "
" + ], + "text/plain": [ + " title published_date \\\n", + "2662 Laura Whitmore: I was gaslighted after raising... 2024-08-04 \n", + "1865 Errollyn Wallen appointed as Master of the Kin... 2024-08-25 \n", + "2554 SDLP: Matthew O'Toole endorses Claire Hanna fo... 2024-08-30 \n", + "1338 Rotherham rioters among those jailed - BBC News 2024-08-20 \n", + "1232 BBC News - BBC iPlayer 2024-08-02 \n", + "\n", + " authors \\\n", + "2662 https://www.facebook.com/bbcnews \n", + "1865 https://www.facebook.com/bbcnews \n", + "2554 https://www.facebook.com/bbcnews \n", + "1338 https://www.facebook.com/bbcnews \n", + "1232 None \n", + "\n", + " description \\\n", + "2662 The former Love Island host said that things s... \n", + "1865 She is best known for her work on the 2012 Par... \n", + "2554 Matthew O'Toole had been named by some as a po... \n", + "1338 Two men who were part of a mob targeting a Hol... \n", + "1232 None \n", + "\n", + " section \\\n", + "2662 Culture \n", + "1865 Culture \n", + "2554 Northern Ireland Politics \n", + "1338 South Yorkshire \n", + "1232 None \n", + "\n", + " content \\\n", + "2662 Television presenter Laura Whitmore has said t... \n", + "1865 Celebrated composer and singer-songwriter Erro... \n", + "2554 Matthew O'Toole leads his party's official opp... \n", + "1338 Rotherham pair among those jailed for UK rioti... \n", + "1232 JavaScript seems to be disabled. Please enable... \n", + "\n", + " link \\\n", + "2662 http://www.bbc.co.uk/news/articles/c9wvwvzm7x7o \n", + "1865 http://www.bbc.co.uk/news/articles/c4gl758g7zgo \n", + "2554 http://www.bbc.co.uk/news/articles/cvg41j7xrzdo \n", + "1338 http://www.bbc.co.uk/news/articles/cwywggd7qw6o \n", + "1232 http://www.bbc.co.uk/news/10318089 \n", + "\n", + " top_image \\\n", + "2662 https://ichef.bbci.co.uk/ace/standard/2560/cps... \n", + "1865 https://ichef.bbci.co.uk/ace/standard/2560/cps... \n", + "2554 https://ichef.bbci.co.uk/ace/standard/3840/cps... \n", + "1338 https://ichef.bbci.co.uk/ace/standard/2560/cps... \n", + "1232 \n", + "\n", + " simple_summary \\\n", + "2662 Television presenter Laura Whitmore has spoken... \n", + "1865 Errollyn Wallen has been appointed Master of t... \n", + "2554 Matthew O'Toole, the leader of the official op... \n", + "1338 Two men, Nathan Palmer (29) and Niven Matthewm... \n", + "1232 The article discusses the need to enable JavaS... \n", + "\n", + " complex_summary \\\n", + "2662 1. **Type of News**: Entertainment/Television\\... \n", + "1865 1. **Type of News**: Arts/Music\\n\\n2. **Summar... \n", + "2554 1. **Type of News**: Politics\\n\\n2. **Summary*... \n", + "1338 1. **Type of News**: Politics / Crime and Just... \n", + "1232 I cannot provide a summary of the article as t... \n", + "\n", + " simple_evaluation \\\n", + "2662 categorization=4 keyword_extraction=3 sentimen... \n", + "1865 categorization=4 keyword_extraction=4 sentimen... \n", + "2554 categorization=5 keyword_extraction=4 sentimen... \n", + "1338 categorization=3 keyword_extraction=3 sentimen... \n", + "1232 categorization=2 keyword_extraction=3 sentimen... \n", + "\n", + " complex_evaluation \n", + "2662 categorization=5 keyword_extraction=5 sentimen... \n", + "1865 categorization=5 keyword_extraction=5 sentimen... \n", + "2554 categorization=5 keyword_extraction=5 sentimen... \n", + "1338 categorization=5 keyword_extraction=4 sentimen... \n", + "1232 categorization=1 keyword_extraction=1 sentimen... " + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def evaluate_summaries(row):\n", + " simple_messages = [{\"role\": \"user\", \"content\": evaluation_prompt.format(original_article=row[\"content\"], summary=row['simple_summary'])}]\n", + " complex_messages = [{\"role\": \"user\", \"content\": evaluation_prompt.format(original_article=row[\"content\"], summary=row['complex_summary'])}]\n", + " \n", + " simple_summary = client.beta.chat.completions.parse(\n", + " model=\"gpt-4o\",\n", + " messages=simple_messages,\n", + " response_format=ScoreCard)\n", + " simple_summary = simple_summary.choices[0].message.parsed\n", + " \n", + " complex_summary = client.beta.chat.completions.parse(\n", + " model=\"gpt-4o\",\n", + " messages=complex_messages,\n", + " response_format=ScoreCard)\n", + " complex_summary = complex_summary.choices[0].message.parsed\n", + " \n", + " return simple_summary, complex_summary\n", + "\n", + "# Add new columns to the dataframe for storing evaluations\n", + "df['simple_evaluation'] = None\n", + "df['complex_evaluation'] = None\n", + "\n", + "# Use ThreadPoolExecutor to evaluate itineraries concurrently\n", + "with ThreadPoolExecutor() as executor:\n", + " futures = {executor.submit(evaluate_summaries, row): index for index, row in df.iterrows()}\n", + " for future in tqdm(as_completed(futures), total=len(futures), desc=\"Evaluating Summaries\"):\n", + " index = futures[future]\n", + " simple_evaluation, complex_evaluation = future.result()\n", + " df.at[index, 'simple_evaluation'] = simple_evaluation\n", + " df.at[index, 'complex_evaluation'] = complex_evaluation\n", + "\n", + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "df[\"simple_scores\"] = df[\"simple_evaluation\"].apply(lambda x: [score for key, score in x.model_dump().items() if key != 'justification'])\n", + "df[\"complex_scores\"] = df[\"complex_evaluation\"].apply(lambda x: [score for key, score in x.model_dump().items() if key != 'justification'])\n", + "\n", + "\n", + "# Calculate average scores for each criterion\n", + "criteria = [\n", + " 'Categorisation',\n", + " 'Keywords and Tags',\n", + " 'Sentiment Analysis',\n", + " 'Clarity and Structure',\n", + " 'Detail and Completeness'\n", + "]\n", + "\n", + "# Calculate average scores for each criterion by model\n", + "simple_avg_scores = df['simple_scores'].apply(pd.Series).mean()\n", + "complex_avg_scores = df['complex_scores'].apply(pd.Series).mean()\n", + "\n", + "\n", + "# Prepare data for plotting\n", + "avg_scores_df = pd.DataFrame({\n", + " 'Criteria': criteria,\n", + " 'Original Prompt': simple_avg_scores,\n", + " 'Improved Prompt': complex_avg_scores\n", + "})\n", + "\n", + "# Plotting\n", + "ax = avg_scores_df.plot(x='Criteria', kind='bar', figsize=(6, 4))\n", + "plt.ylabel('Average Score')\n", + "plt.title('Comparison of Simple vs Complex Prompt Performance by Model')\n", + "plt.xticks(rotation=45, ha='right')\n", + "plt.tight_layout()\n", + "plt.legend(loc='upper left', bbox_to_anchor=(1, 1))\n", + "plt.show()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After evaluating the results, we found that while the basic prompt performed well in clarity and structure, the enhanced prompt significantly improved outputs across several other key criteria: Categorization, Keywords and Tags, Sentiment Analysis, and Detail and Completeness. The complex prompt led to summaries that were more informative, better organized, and richer in content.\n", + "\n", + "This demonstrates how refining prompts can greatly enhance the quality of the generated summaries. Although this is a simplified example, the benefits of prompt optimization are expected to be even more pronounced in real-world, production-level applications, leading to outputs that are more aligned with specific goals and user needs." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion\n", + "\n", + "Meta prompting is a powerful technique that can significantly enhance the quality of outputs from language models. Our exploration showed that starting with a simple prompt and refining it using `o1-preview` led to summaries that were more informative, better organized, and richer in content—improving across key criteria like categorization, keywords and tags, sentiment analysis, and completeness. This exercise underscores the value of prompt optimization, and even in this simplified example, the benefits are clear. In real-world applications, leveraging meta prompting and tools like `o1-preview` can elevate language model performance to better meet your specific goals and user needs." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.1" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/registry.yaml b/registry.yaml index ad46334dbe..b221875a2b 100644 --- a/registry.yaml +++ b/registry.yaml @@ -1645,6 +1645,15 @@ - completions - audio +- title: Enhance your prompts with meta prompting + path: examples/Enhance_your_prompts_with_meta_prompting.ipynb + date: 2024-10-23 + authors: + - teomusatoiu + tags: + - completions + - reasoning + - title: GPT Actions library - GitHub path: examples/chatgpt/gpt_actions_library/gpt_action_github.md date: 2024-10-23