lm-sys · tsunghan-wu · Oct 22, 2024 · Oct 24, 2024 · Oct 26, 2024 · Oct 27, 2024
diff --git a/.gitignore b/.gitignore
@@ -1,16 +1,20 @@
 # Python
+agent_api_endpoints.json
 __pycache__
 *.pyc
 *.egg-info
 dist
 .venv
+keys.env
 
 # Log
 *.log
 *.log.*
 *.json
 !playground/deepspeed_config_s2.json
 !playground/deepspeed_config_s3.json
+!evaluation/**/*.json
+!evaluation/**/*.csv
 
 # Editor
 .idea
@@ -21,10 +25,12 @@ dist
 wandb
 output
 checkpoints_flant5_3b
+.gradio/
 
 # Data
 *.pkl
 *.csv
+!evaluation/**/*.csv
 tests/state_of_the_union.txt
 
 # Build

diff --git a/README.md b/README.md
@@ -1,4 +1,12 @@
 # FastChat
+
+## Latest Update and TODOs
+
+- [ ] Enable google serach function call (by 10/28/2024)
+    - [x] Modify the fastchat codebase to support function calling during the chat for OpenAI GPT-4. Please see more at (./docs/agent.md)
+    - [x] Complete the google search function. Currently, it's a prototpye function at [`fastchat/tools/search.py`](./fastchat/tools/search.py)
+    - [ ] Make the agent call scalable for more LLMs (in addition to OpenAI's API models).
+
 | [**Demo**](https://lmarena.ai/) | [**Discord**](https://discord.gg/6GXcFg3TH8) | [**X**](https://x.com/lmsysorg) |
 
 FastChat is an open platform for training, serving, and evaluating large language model based chatbots.

diff --git a/docs/agent.md b/docs/agent.md
@@ -0,0 +1,86 @@
+# Agent Arena Working Area
+
+## The latest status
+
+- Done:
+  - [x] Complete the basic google search function in `fastchat/tools/search.py`. The pipeline now works for OpenAI search.
+  - [x] Find some successful and failure cases using our naive search tool.
+- TODOs:
+  - [ ] Set an option to display the web search result or not (UI-related stuffs).
+  - [ ] Scale the searching functions to other LLMs.
+  - [ ] Run our pipeline on Arena Datasets to see if this naive search is sufficient.
+
+- Note: Please run `./format.sh` before merging into the main branch.
+
+**Note**: Please install packages and ensure you can successfully execute [Launch a WebUI with an API Model](https://github.com/tsunghan-wu/Agent_FastChat/blob/main/docs/model_support.md#api-based-models).
+
+## Launch agent-enabled Chatbot Arena (for OpenAI APIs currently)
+
+1. Specify the endpoint information in a JSON configuration file. For instance, create a file named `agent_api_endpoints.json`:
+
+```
+{
+  "gpt4o": {
+    "model_name": "gpt-4o-2024-08-06",
+    "api_type": "openai",
+    "api_base": "https://api.openai.com/v1",
+    "api_key": "sk-******",
+    "anony_only": false,
+    "recommended_config": {
+      "temperature": 0.7,
+      "top_p": 1.0
+    },
+    "text-arena": true,
+    "vision-arena": false,
+    "agent-mode": true
+  }
+}
+```
+
+2. Launch the Gradio web server with the argument `--register api_endpoints.json`:
+
+```bash
+python3 -m fastchat.serve.gradio_web_server_agent --controller "" --share --register agent_api_endpoints.json
+```
+
+Now, you can open a browser and interact with the model.
+
+## Examples:
+
+1. Using Agents
+
+```
+User: What's the weather today?
+
+GPT-4: 
+{
+    "thought": "The query asks about the current weather, which is dynamic and location-specific information. Since I don't have direct access to real-time weather data, I should use the available tool to search for the current weather conditions. The 'google_search' tool can help find this information by querying with relevant keywords.",
+    "action": {
+        "name": "google_search",
+        "reason": "To obtain the current weather information from online sources.",
+        "arguments": {
+            "key_words": ["current weather"],
+            "topk": 1
+        }
+    }
+}
+
+Then, we'll execute the google search function in the fastchat/tools/search.py where it currently only return a pre-defined string. You can also conduct follow-up (multi-round) chatting, it won't be a problem.
+```
+
+2. Without using agents
+
+```
+User: How are you?
+GPT-4:
+{
+    "thought": "The query is asking for a status update or well-being check on myself as an assistant. This is a common conversational question and doesn't require additional information from external sources. I can answer this directly based on my designed functionality.",
+    "answer": "I'm just a virtual assistant, so I don't have feelings or states of being, but I'm here and ready to help you with any questions or tasks you have!"
+}
+```
+
+## Comparsing Responses Between Agent and Non-Agent Modes
+
+You can use `compare_agents.ipynb` notebook to compare the response between standard LM and one augmented with our search ability
+1. Start the server as usual
+2. Run the notebook
diff --git a/evaluation/archive/compare_agents.ipynb b/evaluation/archive/compare_agents.ipynb
@@ -0,0 +1,121 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from datasets import load_dataset\n",
+    "from random import randint\n",
+    "import json\n",
+    "\n",
+    "# Login using e.g. `huggingface-cli login` to access this dataset\n",
+    "ds = load_dataset(\"lmsys/lmsys-chat-1m\")['train']\n",
+    "sample_idxs = [randint(0, len(ds)) for _ in range(300)]\n",
+    "samples = [ds[i] for i in sample_idxs]\n",
+    "single_turn_samples = [s for s in samples if len(s['conversation']) == 2]\n",
+    "prompts = [s['conversation'][0]['content'] for s in single_turn_samples]\n",
+    "with open('prompts.json', 'w') as f:\n",
+    "    json.dump(prompts, f, indent=2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "prompts = json.load(open(\"prompts.json\"))[:100]\n",
+    "server_url = \"https://e1f18acc28cf24eea6.gradio.live/\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from openai import OpenAI\n",
+    "def get_response_standard(prompt):\n",
+    "    system_prompt = \"You are a helpful assistant.\"\n",
+    "    client = OpenAI()\n",
+    "    completion = client.chat.completions.create(\n",
+    "    model=\"gpt-4o\",\n",
+    "    messages=[\n",
+    "        {\"role\": \"system\", \"content\": system_prompt},\n",
+    "        {\"role\": \"user\", \"content\": prompt},\n",
+    "    ],\n",
+    "    temperature=1.0,\n",
+    "    top_p=0.7,\n",
+    "    max_tokens=512,\n",
+    "    )\n",
+    "\n",
+    "    return completion.choices[0].message.content\n",
+    "\n",
+    "\n",
+    "\n",
+    "from gradio_client import Client\n",
+    "def get_response_agent(prompt):\n",
+    "    client = Client(\"https://e1f18acc28cf24eea6.gradio.live/\")\n",
+    "    result = client.predict(\n",
+    "            model_selector=\"react-agent\",\n",
+    "            text=prompt,\n",
+    "            api_name=\"/add_text_1\")\n",
+    "    out = client.predict(\n",
+    "    temperature=1.0,\n",
+    "    top_p=0.7,\n",
+    "    max_new_tokens=512,\n",
+    "    api_name=\"/bot_response_2\"\n",
+    "    )\n",
+    "    return out[0][1]\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import tqdm\n",
+    "responses = []\n",
+    "for prompt in tqdm.tqdm(prompts):\n",
+    "    agent_response = get_response_agent(prompt)\n",
+    "    standard_response = get_response_standard(prompt)\n",
+    "    responses.append({\"prompt\": prompt, \"agent_response\": agent_response, \"standard_response\": standard_response})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open(\"responses.json\", \"w\") as f:\n",
+    "    json.dump(responses, f, indent=2)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}