Skip to content

Computer Agent Arena Hub: Compare & Test AI Agents on Crowdsourced Real-World Computer Use Tasks

Notifications You must be signed in to change notification settings

xlang-ai/computer-agent-arena-hub

Repository files navigation

📖 Overview

The Computer Agent Arena is an open-ended evaluation platform designed for benchmarking LLMs/VLMs-based AI agents in real-world computer tasks across diverse domains, ranging from general desktop operations to specialized workflows, such as programming, data analysis, and multimedia editing.

This repository hosts the source code implementations for all supported agents on the platform, serving as a foundation to integrate and extend support for additional agents within the Computer Agent Arena ecosystem.

💾 Getting Started

Installation

  1. Clone the Repository:

    git clone [email protected]:xlang-ai/Computer-Agent-Arena.git
    cd Computer-Agent-Arena
  2. Install Dependencies:

    pip install -r requirements.txt
  3. Setup Environment Variables: Copy the example environment configuration file and fill in the necessary credentials:

    cp .env.example .env

    Example .env file:

     export API_KEYS="{\
     \"OPENAI_API_KEY\":\"YOUR_KEY_HERE\",\
     \"GENAI_API_KEY\":\"YOUR_KEY_HERE\",\
     \"ANTHROPIC_API_KEY\":\"YOUR_KEY_HERE\"\
     }"
    
  4. Run Tests: Test the default agents to ensure the setup is working:

    # Activate the virtual environment
    source .env
    
    # Run the tests
    python test/test_agents.py

If the tests pass, your environment is ready!

Implement Customized Agents

🤝 Want to contribute?
Check out our Contributing Guide to learn how you can plugin your agent to improve Computer Agent Arena!

Test Customized Agents

Once you have implemented your agent, you can add it to the test suite by adding a new test function in test/test_agents.py.

# Activate the virtual environment
source .env

# Run the tests and pass
python test/test_agents.py

After testing, submit a pull request (PR) with your implementation. Refer to the Contributing Guide for detailed instructions.

Once your PR is submitted, email us at here for further details.

We really appreciate any contributions to improve Computer Agent Arena! If there are any questions, feel free to open an issue or contact us via email.

📄 Citation

If you find this project useful, please consider citing our project:

@misc{ComputerAgentArena2025,
  title={Computer Agent Arena: Compare & Test AI Agents on Crowdsourced Real-World Computer Use Tasks},
  url={https://arena.xlang.ai},
  year={2025}
}

About

Computer Agent Arena Hub: Compare & Test AI Agents on Crowdsourced Real-World Computer Use Tasks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages