AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

📑 Paper | 🌐 Project Page | 💾 AgentTrek Browser-use Trajectories | 💾 AgentTrek-1.0-32B

intruduction

AgentTrek is a cost-efficient and scalable framework that synthesizes high-quality agent trajectories by guiding replay with web tutorials. These collected trajectories significantly enhance agent performance.

Key Features & Contributions

🔄 Scalable Data Synthesis Pipeline: Cost-efficient and scalable pipeline to synthesize high-quality agent trajectories.
📊 Comprehensive Dataset: Largest-scale dataset of browseruse agent trajectories with multimodal grounding and reasoning.
🤖 Capable Brouwseruse Agent: Fully autonomous browseruse agent capable for performing general tasks.

BrowserGym LeaderBoard 🌐 Browsergym LeaderBoard

Agent	WebArena	WorkArena-L1	WorkArena-L2	WorkArena-L3	MiniWoB
Claude-3.5-Sonnet	36.20	56.40	39.10	0.40	69.80
GPT-4o	31.40	45.50	8.50	0.00	63.80
GPT-o1-mini	28.60	56.70	14.90	0.00	67.80
Llama-3.1-405b	24.00	43.30	7.20	0.00	64.60
AgentTrek-32b💫	22.40	38.29	2.98	0.00	60.00
Llama-3.1-70b	18.40	27.90	2.10	0.00	57.60
GPT-4o-mini	17.40	27.00	1.30	0.00	56.60

Our Browseruse agent demonstrate exceptional performance in real-world online scenarios:

Getting Started

Installation

Clone the repository:

git clone [email protected]:xlang-ai/AgentTrek.git
cd agenttrek

Create and activate a conda environment:

conda create -n agenttrek python=3.10
conda activate agenttrek

Install PyTorch and dependencies:

pip install -e .

Data Preparation

Training

Model Checkpoints

AgentTrek-7B: cooking🧑‍🍳
AgentTrek-32B: model
AgentTrek-72B: cooking🧑‍🍳

Evaluation

MiniWob++ Evaluation

Configure your evaluation settings:
- Open scripts/run_webarena.sh
- Set the OPENAI_API_KEY variable to your own openai api key
- Set the AGENTLAB_EXP_ROOT variable to specify the path for the results
- Set the MINIWOB_URL variable to your Miniwob URL
- Set the OPENAI_BASE_URL variable to specify your model base url
Start inference:

bash scripts/run_miniwob.sh

WebArena Evaluation

Configure your evaluation settings:
- Open scripts/run_webarena.sh
- Set the OPENAI_API_KEY variable to your own OpenAI API key
- Set the AGENTLAB_EXP_ROOT variable to specify the path for the results
- Set the following URL variables for the relevant platforms:
  - BASE_URL: Specify the base URL for your WebArena instance
  - WA_SHOPPING: URL for the Shopping benchmark
  - WA_SHOPPING_ADMIN: URL for the Shopping Admin benchmark
  - WA_REDDIT: URL for the Reddit benchmark
  - WA_GITLAB: URL for the GitLab benchmark
  - WA_WIKIPEDIA: URL for the Wikipedia benchmark
  - WA_MAP: URL for the Map benchmark
  - WA_HOMEPAGE: URL for the Homepage benchmark
- Set the OPENAI_BASE_URL variable to specify your model base URL
Start inference:

bash scripts/run_webarena.sh

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
run_exp.py		run_exp.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

intruduction

Key Features & Contributions

BrowserGym LeaderBoard 🌐 Browsergym LeaderBoard

Getting Started

Installation

Data Preparation

Training

Model Checkpoints

Evaluation

About

Contributors 2

Languages

xlang-ai/AgentTrek

Folders and files

Latest commit

History

Repository files navigation

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

intruduction

Key Features & Contributions

BrowserGym LeaderBoard 🌐 Browsergym LeaderBoard

Getting Started

Installation

Data Preparation

Training

Model Checkpoints

Evaluation

About

Resources

Stars

Watchers

Forks

Contributors 2

Languages