Skip to content

Commit

Permalink
fix readme card_game path
Browse files Browse the repository at this point in the history
  • Loading branch information
lr-tsinghua11 committed Oct 20, 2023
1 parent 84cceb3 commit 7ba6cbf
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 22 deletions.
28 changes: 14 additions & 14 deletions README-zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

## AgentInstruct

**AgentInstruct** 是一个精挑细选的智能体数据集,包含 **1866** 个高质量交互、**6** 个多样化的真实场景任务,用于增强语言模型的 Agent 能力,有如下特性
**AgentInstruct** 是一个经过挑选的智能体数据集,包含 **1866** 个高质量交互、**6** 个多样化的真实场景任务,用于增强语言模型的 Agent 能力,有如下特性

- 🔍 **思维链** - 采用 [ReAct](http://arxiv.org/abs/2210.03629) 提示词策略,为每步操作提供详细的思维链,深入理解模型决策过程

Expand All @@ -31,7 +31,7 @@

## AgentLM

**AgentLM** 由 Llama2-chat 开源模型系列在 **AgentInstruct****ShareGPT** 混合数据集上微调得到。模型遵循 [Llama-2-chat](https://huggingface.co/blog/llama2#how-to-prompt-llama-2) 的对话格式,系统提示词固定为 `You are a helpful, respectful and honest assistant`
**AgentLM** 由 Llama2-chat 开源模型系列在 **AgentInstruct****ShareGPT** 混合数据集上微调得到。模型遵循 [Llama-2-chat](https://huggingface.co/blog/llama2#how-to-prompt-llama-2) 的对话格式,系统提示词固定为 `You are a helpful, respectful and honest assistant.`

7B、13B 和 70B 模型开源网址如下

Expand Down Expand Up @@ -61,30 +61,30 @@ curl 127.0.0.1:30070/generate \
# {"generated_text":"Hello! How can I help you today? "}
```

如果 GPU 充足,可在 docker compose 文件后面增加更多端口,产生多个推理实例。
可在 docker compose 文件后面增加更多端口,产生多个推理实例。

## 评测

模型评测包含 6 个 held-in 任务、6 个 held-out 任务、通用任务

### Held-in 任务

6 个保留任务来源于 [**AgentBench**](https://github.com/THUDM/AgentBench)。 但是,由于 AgentBench 仍在活跃开发,最新版本可能无法完全重现论文中报告的结果。
6 个保留任务来源于 [**AgentBench**](https://github.com/THUDM/AgentBench)。 但是,由于 AgentBench 仍在开发中,最新版本可能无法完全重现论文中报告的结果。

本项目有关评测代码位于`./AgentBench.old` 文件夹中。

### Held-out 任务

Held-out 任务来源于以下开源框架

| 任务 | AgentTuning 评测脚本 | 原始仓库 |
| ----------------- | ----------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- |
| SciWorld | [📂 eval_heldout/science-world](eval_heldout/science-world/) | [💻 allenai/ScienceWorld](https://github.com/allenai/ScienceWorld) |
| MiniWoB++ | [📂 eval_heldout/miniwob++](eval_heldout/miniwob++) | [💻 Farama-Foundation/miniwob-plusplus](https://github.com/Farama-Foundation/miniwob-plusplus) |
| HotpotQA | [📂 eval_heldout/hotpotQA](eval/held_out/hotpotQA) | [💻 salesforce/BOLAA](https://github.com/salesforce/BOLAA) |
| ReWOO | [📂 eval_heldout/rewoo](eval_heldout/rewwo/) | [💻 billxbf/ReWOO](https://github.com/billxbf/ReWOO) |
| WebArena | [📂 eval_heldout/webarena](eval_heldout/webarena/) | [💻 web-arena-x/webarena](https://github.com/web-arena-x/webarena) |
| Digital Card Game | [💻 THUDM/AgentBench](https://github.com/THUDM/AgentBench) ( _Extend_ Split ) | [💻 THUDM/AgentBench](https://github.com/THUDM/AgentBench) |
| 任务 | AgentTuning 评测脚本 | 原始仓库 |
| ----------------- | ----------------------------------------------------------- | ------------------------------------------------------------ |
| SciWorld | [📂 eval_heldout/science-world](eval_heldout/science-world/) | [💻 allenai/ScienceWorld](https://github.com/allenai/ScienceWorld) |
| MiniWoB++ | [📂 eval_heldout/miniwob++](eval_heldout/miniwob++) | [💻 Farama-Foundation/miniwob-plusplus](https://github.com/Farama-Foundation/miniwob-plusplus) |
| HotpotQA | [📂 eval_heldout/hotpotQA](eval/held_out/hotpotQA) | [💻 salesforce/BOLAA](https://github.com/salesforce/BOLAA) |
| ReWOO | [📂 eval_heldout/rewoo](eval_heldout/rewwo/) | [💻 billxbf/ReWOO](https://github.com/billxbf/ReWOO) |
| WebArena | [📂 eval_heldout/webarena](eval_heldout/webarena/) | [💻 web-arena-x/webarena](https://github.com/web-arena-x/webarena) |
| Digital Card Game | [💻 AgentBench.old](./AgentBench.old) ( _Extend_ Split ) | [💻 THUDM/AgentBench](https://github.com/THUDM/AgentBench) |

### 通用任务

Expand All @@ -111,7 +111,7 @@ Held-out 任务来源于以下开源框架
python eval_general/evaluate_gsm8k_tgi.py --port 30070
```

使用 `--sample-input-file` 可以加载本地数据,否则脚本会下载 [GSM8K](https://huggingface.co/datasets/gsm8k)  到本地
使用 `--sample-input-file` 可以加载本地数据,否则脚本会下载 [GSM8K](https://huggingface.co/datasets/gsm8k) 到本地

**MT-Bench 配置**

Expand All @@ -138,7 +138,7 @@ Held-out 任务来源于以下开源框架

## 引用

如果你觉得我们的工作有益,请引用 AgentTuning:
如果你觉得我们的工作有帮助的话,请考虑引用下列论文

```
@misc{zeng2023agenttuning,
Expand Down
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,14 +76,14 @@ The 6 held-in tasks are selected from [**AgentBench**](https://github.com/THUDM/

Held-out tasks are recompiled from the following frameworks:

| Task | AgentTuning Setup | Original Repo |
| ----------------- | ----------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- |
| SciWorld | [📂 eval_heldout/science-world](eval_heldout/science-world/) | [💻 allenai/ScienceWorld](https://github.com/allenai/ScienceWorld) |
| MiniWoB++ | [📂 eval_heldout/miniwob++](eval_heldout/miniwob++) | [💻 Farama-Foundation/miniwob-plusplus](https://github.com/Farama-Foundation/miniwob-plusplus) |
| HotpotQA | [📂 eval_heldout/hotpotQA](eval/held_out/hotpotQA) | [💻 salesforce/BOLAA](https://github.com/salesforce/BOLAA) |
| ReWOO | [📂 eval_heldout/rewoo](eval_heldout/rewwo/) | [💻 billxbf/ReWOO](https://github.com/billxbf/ReWOO) |
| WebArena | [📂 eval_heldout/webarena](eval_heldout/webarena/) | [💻 web-arena-x/webarena](https://github.com/web-arena-x/webarena) |
| Digital Card Game | [💻 THUDM/AgentBench](https://github.com/THUDM/AgentBench) ( _Extend_ Split ) | [💻 THUDM/AgentBench](https://github.com/THUDM/AgentBench) |
| Task | AgentTuning Setup | Original Repo |
| ----------------- | ----------------------------------------------------------- | ------------------------------------------------------------ |
| SciWorld | [📂 eval_heldout/science-world](eval_heldout/science-world/) | [💻 allenai/ScienceWorld](https://github.com/allenai/ScienceWorld) |
| MiniWoB++ | [📂 eval_heldout/miniwob++](eval_heldout/miniwob++) | [💻 Farama-Foundation/miniwob-plusplus](https://github.com/Farama-Foundation/miniwob-plusplus) |
| HotpotQA | [📂 eval_heldout/hotpotQA](eval/held_out/hotpotQA) | [💻 salesforce/BOLAA](https://github.com/salesforce/BOLAA) |
| ReWOO | [📂 eval_heldout/rewoo](eval_heldout/rewwo/) | [💻 billxbf/ReWOO](https://github.com/billxbf/ReWOO) |
| WebArena | [📂 eval_heldout/webarena](eval_heldout/webarena/) | [💻 web-arena-x/webarena](https://github.com/web-arena-x/webarena) |
| Digital Card Game | [💻 AgentBench.old](./AgentBench.old) ( _Extend_ Split ) | [💻 THUDM/AgentBench](https://github.com/THUDM/AgentBench) |

### General Tasks

Expand Down

0 comments on commit 7ba6cbf

Please sign in to comment.