Skip to content

ServiceNow/BrowserGym

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ca7419e Β· Dec 19, 2024
Nov 25, 2024
Dec 19, 2024
Oct 18, 2024
Nov 5, 2024
Nov 27, 2024
Dec 6, 2024
Oct 28, 2024
May 9, 2024
Jun 18, 2024
Mar 11, 2024
Oct 25, 2024
Dec 18, 2024
Aug 1, 2024

Repository files navigation

BrowserGym banner

πŸ› οΈ Setup - πŸ‹ Usage - πŸ’» Demo - 🌐 Ecosystem - πŸš€ AgentLab - 🌟 Contributors - πŸ“„ Paper - πŸ“ Citation

pypi PyPI - License PyPI - Downloads GitHub star chart Code Format Tests

pip install browsergym

Warning

BrowserGym is meant to provide an open, easy-to-use and extensible framework to accelerate the field of web agent research. It is not meant to be a consumer product. Use with caution!

Tip

πŸš€ Check out AgentLab✨ ! A seamless framework to implement, test, and evaluate your web agents on all BrowserGym benchmarks.

4x4.grid.mp4

Example of a GPT4-V agent executing openended tasks (top row, chat interactive), as well as WebArena and WorkArena tasks (bottom row).

BrowserGym includes the following benchmarks by default:

Designing new web benchmarks with BrowserGym is easy, and simply requires to inherit the AbstractBrowserTask class.

πŸ› οΈ Setup

To use browsergym, install one of the following packages:

pip install browsergym  # (recommended) everything below
pip install browsergym-experiments  # experiment utilities (agent, loop, benchmarks) + everything below
pip install browsergym-core  # core functionalities only (no benchmark, just the openended task)
pip install browsergym-miniwob  # core + miniwob
pip install browsergym-webarena  # core + webarena
pip install browsergym-visualwebarena  # core + visualwebarena
pip install browsergym-workarena  # core + workarena
pip install browsergym-assistantbench  # core + assistantbench
pip install weblinx-browsergym  # core + weblinx

Then setup playwright by running

playwright install chromium

Finally, each benchmark comes with its own specific setup that requires to follow additional steps.

πŸ—οΈ Development setup

To install browsergym locally for development, use the following commands:

git clone git@github.com:ServiceNow/BrowserGym.git
cd BrowserGym
make install

Contributions are welcome! 😊

πŸ‹ Usage

Boilerplate code to run an agent on an interactive, open-ended task:

import gymnasium as gym
import browsergym.core  # register the openended task as a gym environment

# start an openended environment
env = gym.make(
    "browsergym/openended",
    task_kwargs={"start_url": "https://www.google.com/"},  # starting URL
    wait_for_user_message=True,  # wait for a user message after each agent message sent to the chat
)
# run the environment <> agent loop until termination
obs, info = env.reset()
while True:
    action = ...  # implement your agent here
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        break
# release the environment
env.close()

MiniWoB

import gymnasium as gym
import browsergym.miniwob  # register miniwob tasks as gym environments

# start a miniwob task
env = gym.make("browsergym/miniwob.choose-list")
...

# list all the available miniwob tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/miniwob")]
print("\n".join(env_ids))

WorkArena

import gymnasium as gym
import browsergym.workarena  # register workarena tasks as gym environments

# start a workarena task
env = gym.make("browsergym/workarena.servicenow.order-ipad-pro")
...

# list all the available workarena tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")]
print("\n".join(env_ids))

WebArena

import gymnasium as gym
import browsergym.webarena  # register webarena tasks as gym environments

# start a webarena task
env = gym.make("browsergym/webarena.310")
...

# list all the available webarena tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/webarena")]
print("\n".join(env_ids))

VisualWebArena

import gymnasium as gym
import browsergym.webarena  # register webarena tasks as gym environments

# start a visualwebarena task
env = gym.make("browsergym/visualwebarena.721")
...

# list all the available visualwebarena tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/visualwebarena")]
print("\n".join(env_ids))

AssistantBench

import gymnasium as gym
import browsergym.workarena  # register assistantbench tasks as gym environments

# start an assistantbench task
env = gym.make("browsergym/assistantbench.validation.3")
...

# list all the available assistantbench tasks
env_ids = [id for id in gym.envs.registry.keys() if id.startswith("browsergym/workarena")]
print("\n".join(env_ids))

πŸ’» Demo

If you want to experiment with a demo agent in BrowserGym, follow these steps

# conda setup
conda env create -f demo_agent/environment.yml
conda activate demo_agent

# or pip setup
pip install -r demo_agent/requirements.txt

# then download the browser for playwright
playwright install chromium

Our demo agent uses openai as a backend, be sure to set your OPENAI_API_KEY.

Launch the demo agent as follows

# openended (interactive chat mode)
python demo_agent/run_demo.py --task_name openended --start_url https://www.google.com

# miniwob
python demo_agent/run_demo.py --task_name miniwob.click-test

# workarena
python demo_agent/run_demo.py --task_name workarena.servicenow.order-standard-laptop

# webarena
python demo_agent/run_demo.py --task_name webarena.4

# visualwebarena
python demo_agent/run_demo.py --task_name visualwebarena.398

You can customize your experience by changing the model_name to your preferred LLM (it uses gpt-4o-mini by default), adding screenshots for your VLMs with use_screenshot, and much more!

python demo_agent/run_demo.py --help

🌐 Ecosystem

  • AgentLab: Seamlessly run agents on benchmarks, collect and analyse traces.
  • WorkArena(++): A benchmark for web agents on the ServiceNow platform.
  • WebArena: A benchmark of realistic web tasks on self-hosted domains.
  • VisualWebArena: A benchmark of realistic visual web tasks on self-hosted domains.
  • MiniWoB(++): A collection of over 100 web tasks on synthetic web pages.
  • WebLINX: A dataset of real-world web interaction traces.
  • AssistantBench: A benchmark of realistic and time-consuming tasks on the open web.

🌟 Contributors

BrowserGym contributors

πŸ“ Citing This Work

Please use the following BibTeX to cite our work:

@inproceedings{workarena2024,
    title = {{W}ork{A}rena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?},
    author = {Drouin, Alexandre and Gasse, Maxime and Caccia, Massimo and Laradji, Issam H. and Del Verme, Manuel and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre},
    booktitle = {Proceedings of the 41st International Conference on Machine Learning},
    pages = {11642--11662},
    year = {2024},
    editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
    volume = {235},
    series = {Proceedings of Machine Learning Research},
    month = {21--27 Jul},
    publisher = {PMLR},
    url = {https://proceedings.mlr.press/v235/drouin24a.html},
}