This is a Python implementation of Cerebellum, a lightweight browser automation agent that accomplishes user-defined goals using keyboard and mouse actions.
Goal: Show me the wikipedia page of the creator of Bitcoin
python_bitcoin.mp4
-
Install the package from pypi.
pip install cerebellum
-
Set up your Anthropic API key:
export ANTHROPIC_API_KEY='your-api-key'
-
Import module and use.
from seleniumbase import get_driver from cerebellum import AnthropicPlanner, BrowserAgent, BrowserAgentOptions, pause_for_input def main(): driver = get_driver() try: # Set your starting page driver.get("https://www.google.com") # Define your goal goal = "Show me the wikipedia page of the creator of Bitcoin" # Create the Cerebellum browser agent planner = AnthropicPlanner() options = BrowserAgentOptions(pause_after_each_action=True) agent = BrowserAgent(driver, planner, goal, options) agent.pause_after_each_action = False pause_for_input() # Have Cerebellum takeover website navigation agent.start() # Goal has now been reached, you may interact with the Selenium driver any way you want pause_for_input() finally: driver.quit() if __name__ == "__main__": main()
The example script demonstrates searching for Bitcoin's creator on Google and navigating to their Wikipedia page:
python examples/google.py
The Python implementation consists of three main modules:
- Contains the core
BrowserAgent
class that manages browser automation - Handles state management and action execution
- Coordinates between the webdriver and action planner
- Provides utility functions for keyboard input parsing
- Handles terminal input for interactive sessions
- Converts between different coordinate systems
- Implements the
AnthropicPlanner
class using Claude 3.5 Sonnet - Manages communication with Anthropic's API
- Processes screenshots and coordinates browser actions