Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gradio web server demo error #3596

Open
wants to merge 58 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
00feea9
Fix gradio web server demo error
tsunghan-wu Oct 22, 2024
a24a32d
Handle agent calls
tsunghan-wu Oct 24, 2024
0593516
Use native multi turn instead
Jiayi-Pan Oct 26, 2024
335397f
Fix bugs in conversation files
tsunghan-wu Oct 27, 2024
56c0537
web search + refactor agent setup
mihranmiroyan Oct 27, 2024
27d5943
update frontend ui, replace if to while loop to support multiple sear…
tsunghan-wu Oct 28, 2024
2600aa9
Merge branch 'lm-sys:main' into main
xyVickyHu Oct 28, 2024
6c073c2
Merge branch 'lm-sys:main' into handle_function_native_multiturn
xyVickyHu Oct 29, 2024
b50ad0c
Merge pull request #1 from tsunghan-wu/handle_function_native_multiturn
xyVickyHu Oct 29, 2024
cb388ea
Fix bugs for unstable llm generation
tsunghan-wu Nov 2, 2024
1a4f351
Merge branch 'handle_function_native_multiturn' of github.com:tsungha…
tsunghan-wu Nov 2, 2024
f9f2e7a
Stabalize the pipeline: (1) force the model to take at most one actio…
tsunghan-wu Nov 5, 2024
a683934
update readme
tsunghan-wu Nov 5, 2024
e43a303
add agent comparison
Jiayi-Pan Nov 6, 2024
c18ed47
add docs on agent compariosn
Jiayi-Pan Nov 6, 2024
2727002
side-by-side agents
mihranmiroyan Nov 6, 2024
1007183
add judge on agent comparison
Nov 6, 2024
a341194
Merge pull request #2 from tsunghan-wu/multi-agent
mihranmiroyan Nov 11, 2024
fce7b28
Merge pull request #3 from tsunghan-wu/handle_function_native_multiturn
mihranmiroyan Nov 11, 2024
6f8ba28
update response
mihranmiroyan Nov 11, 2024
fccbf64
Merge pull request #4 from tsunghan-wu/multi-agent
mihranmiroyan Nov 11, 2024
713254c
you search + firecrawl
mihranmiroyan Nov 11, 2024
d0d08c6
Fix the return format
tsunghan-wu Nov 11, 2024
6b9cc8b
dataset and eval script
mihranmiroyan Nov 13, 2024
dbf1efb
use openai_api to do web search
Nov 15, 2024
b042f20
remove unnecessary files
tsunghan-wu Nov 15, 2024
9befb68
Finish openai agent with a more general interface
tsunghan-wu Nov 15, 2024
e7bfeeb
merge openai's function call api with web-search
tsunghan-wu Nov 15, 2024
337630a
Completely merge the function api call with web search
tsunghan-wu Nov 15, 2024
1c7cefa
update .gitignore
mihranmiroyan Nov 15, 2024
875724f
function calling api evals
mihranmiroyan Nov 15, 2024
80bad27
Merge pull request #5 from tsunghan-wu/web-search
mihranmiroyan Nov 18, 2024
8f5292b
new eval sets
mihranmiroyan Nov 18, 2024
f021491
gpt-4 labels for arena hard
mihranmiroyan Nov 18, 2024
afe5461
Finish Anthropic function calling support
tsunghan-wu Nov 18, 2024
8cefdd1
Finish nvidia llama 3.1 function call api
tsunghan-wu Nov 18, 2024
63e2193
Finish gemini, anthropic, llama3.1, and gpt-4o
tsunghan-wu Nov 19, 2024
5ee7994
Merge pull request #6 from tsunghan-wu/more_api_support
tsunghan-wu Nov 21, 2024
912a694
initial sampling
mihranmiroyan Nov 21, 2024
b0e210c
small arena hard
mihranmiroyan Nov 21, 2024
f7ce345
eval samples
mihranmiroyan Nov 21, 2024
4c3f10e
eval scripts
mihranmiroyan Nov 29, 2024
f71493f
Add multiple output files
tsunghan-wu Dec 1, 2024
9372c1f
add arena hard evaluation table
tsunghan-wu Dec 1, 2024
8b7d00d
perplexity samples + results
mihranmiroyan Dec 1, 2024
2ada366
gpt-4o serp
mihranmiroyan Dec 1, 2024
31407f4
gpt-4o-mini + serp
mihranmiroyan Dec 1, 2024
aae2b05
serp + arena hard
mihranmiroyan Dec 1, 2024
bca62e7
update results and new evaluation script
tsunghan-wu Dec 2, 2024
da93299
update arena hard evaluation for serp apis
tsunghan-wu Dec 2, 2024
e9173a6
update claude results
tsunghan-wu Dec 2, 2024
3e4a53d
add gemini agent
tsunghan-wu Dec 2, 2024
8be9f2b
add arenahard_analysis
Jiayi-Pan Dec 2, 2024
f9ed148
gemini/claude eval results + visualization notebook
mihranmiroyan Dec 2, 2024
d257cf8
Merge branch 'main' of https://github.com/tsunghan-wu/Agent_FastChat
mihranmiroyan Dec 2, 2024
31d5751
remove network errors, add qualitative examples
mihranmiroyan Dec 3, 2024
bec5042
add gemini flash result
tsunghan-wu Dec 3, 2024
59079ec
rendering changes, attempting to stream FC
mihranmiroyan Dec 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
# Python
agent_api_endpoints.json
__pycache__
*.pyc
*.egg-info
dist
.venv
keys.env

# Log
*.log
*.log.*
*.json
!playground/deepspeed_config_s2.json
!playground/deepspeed_config_s3.json
!evaluation/**/*.json
!evaluation/**/*.csv

# Editor
.idea
Expand All @@ -21,10 +25,12 @@ dist
wandb
output
checkpoints_flant5_3b
.gradio/

# Data
*.pkl
*.csv
!evaluation/**/*.csv
tests/state_of_the_union.txt

# Build
Expand Down
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,12 @@
# FastChat

## Latest Update and TODOs

- [ ] Enable google serach function call (by 10/28/2024)
- [x] Modify the fastchat codebase to support function calling during the chat for OpenAI GPT-4. Please see more at (./docs/agent.md)
- [x] Complete the google search function. Currently, it's a prototpye function at [`fastchat/tools/search.py`](./fastchat/tools/search.py)
- [ ] Make the agent call scalable for more LLMs (in addition to OpenAI's API models).

| [**Demo**](https://lmarena.ai/) | [**Discord**](https://discord.gg/6GXcFg3TH8) | [**X**](https://x.com/lmsysorg) |

FastChat is an open platform for training, serving, and evaluating large language model based chatbots.
Expand Down
86 changes: 86 additions & 0 deletions docs/agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Agent Arena Working Area

## The latest status

- Done:
- [x] Complete the basic google search function in `fastchat/tools/search.py`. The pipeline now works for OpenAI search.
- [x] Find some successful and failure cases using our naive search tool.
- TODOs:
- [ ] Set an option to display the web search result or not (UI-related stuffs).
- [ ] Scale the searching functions to other LLMs.
- [ ] Run our pipeline on Arena Datasets to see if this naive search is sufficient.

- Note: Please run `./format.sh` before merging into the main branch.

**Note**: Please install packages and ensure you can successfully execute [Launch a WebUI with an API Model](https://github.com/tsunghan-wu/Agent_FastChat/blob/main/docs/model_support.md#api-based-models).

## Launch agent-enabled Chatbot Arena (for OpenAI APIs currently)

1. Specify the endpoint information in a JSON configuration file. For instance, create a file named `agent_api_endpoints.json`:

```
{
"gpt4o": {
"model_name": "gpt-4o-2024-08-06",
"api_type": "openai",
"api_base": "https://api.openai.com/v1",
"api_key": "sk-******",
"anony_only": false,
"recommended_config": {
"temperature": 0.7,
"top_p": 1.0
},
"text-arena": true,
"vision-arena": false,
"agent-mode": true
}
}
```

2. Launch the Gradio web server with the argument `--register api_endpoints.json`:

```bash
python3 -m fastchat.serve.gradio_web_server_agent --controller "" --share --register agent_api_endpoints.json
```

Now, you can open a browser and interact with the model.

## Examples:

1. Using Agents

```
User: What's the weather today?

GPT-4:
{
"thought": "The query asks about the current weather, which is dynamic and location-specific information. Since I don't have direct access to real-time weather data, I should use the available tool to search for the current weather conditions. The 'google_search' tool can help find this information by querying with relevant keywords.",
"action": {
"name": "google_search",
"reason": "To obtain the current weather information from online sources.",
"arguments": {
"key_words": ["current weather"],
"topk": 1
}
}
}

Then, we'll execute the google search function in the fastchat/tools/search.py where it currently only return a pre-defined string. You can also conduct follow-up (multi-round) chatting, it won't be a problem.
```

2. Without using agents

```
User: How are you?
GPT-4:
{
"thought": "The query is asking for a status update or well-being check on myself as an assistant. This is a common conversational question and doesn't require additional information from external sources. I can answer this directly based on my designed functionality.",
"answer": "I'm just a virtual assistant, so I don't have feelings or states of being, but I'm here and ready to help you with any questions or tasks you have!"
}
```

## Comparsing Responses Between Agent and Non-Agent Modes

You can use `compare_agents.ipynb` notebook to compare the response between standard LM and one augmented with our search ability
1. Start the server as usual
2. Run the notebook
121 changes: 121 additions & 0 deletions evaluation/archive/compare_agents.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from datasets import load_dataset\n",
"from random import randint\n",
"import json\n",
"\n",
"# Login using e.g. `huggingface-cli login` to access this dataset\n",
"ds = load_dataset(\"lmsys/lmsys-chat-1m\")['train']\n",
"sample_idxs = [randint(0, len(ds)) for _ in range(300)]\n",
"samples = [ds[i] for i in sample_idxs]\n",
"single_turn_samples = [s for s in samples if len(s['conversation']) == 2]\n",
"prompts = [s['conversation'][0]['content'] for s in single_turn_samples]\n",
"with open('prompts.json', 'w') as f:\n",
" json.dump(prompts, f, indent=2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"prompts = json.load(open(\"prompts.json\"))[:100]\n",
"server_url = \"https://e1f18acc28cf24eea6.gradio.live/\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from openai import OpenAI\n",
"def get_response_standard(prompt):\n",
" system_prompt = \"You are a helpful assistant.\"\n",
" client = OpenAI()\n",
" completion = client.chat.completions.create(\n",
" model=\"gpt-4o\",\n",
" messages=[\n",
" {\"role\": \"system\", \"content\": system_prompt},\n",
" {\"role\": \"user\", \"content\": prompt},\n",
" ],\n",
" temperature=1.0,\n",
" top_p=0.7,\n",
" max_tokens=512,\n",
" )\n",
"\n",
" return completion.choices[0].message.content\n",
"\n",
"\n",
"\n",
"from gradio_client import Client\n",
"def get_response_agent(prompt):\n",
" client = Client(\"https://e1f18acc28cf24eea6.gradio.live/\")\n",
" result = client.predict(\n",
" model_selector=\"react-agent\",\n",
" text=prompt,\n",
" api_name=\"/add_text_1\")\n",
" out = client.predict(\n",
" temperature=1.0,\n",
" top_p=0.7,\n",
" max_new_tokens=512,\n",
" api_name=\"/bot_response_2\"\n",
" )\n",
" return out[0][1]\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import tqdm\n",
"responses = []\n",
"for prompt in tqdm.tqdm(prompts):\n",
" agent_response = get_response_agent(prompt)\n",
" standard_response = get_response_standard(prompt)\n",
" responses.append({\"prompt\": prompt, \"agent_response\": agent_response, \"standard_response\": standard_response})"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with open(\"responses.json\", \"w\") as f:\n",
" json.dump(responses, f, indent=2)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading
Loading