A lightweight browser using agent that accomplishes user-defined goals on webpages using keyboard and mouse actions.
Goal: Find a USB C to C cable that is 10 feet long and add it to cart
amazon.mp4
Please see setup directions for your language:
- Web browsing is simplified to navigating a directed graph.
- Each webpage is a node with visible elements and data.
- User actions, such as clicking or typing, are edges that move between nodes.
- Cerebellum starts at a webpage and aims to reach a target node that embodies the completed goal.
- It uses a LLM to finds new nodes by analyzing page content and interactive elements.
- The LLM decides the next action based on the current state and past actions.
- Cerebellum executes the LLM's planned action and feeds the new state back into the LLM for next step.
- The process ends when the LLM decides the goal has been reached or is unachieveable.
Currently, Claude 3.5 Sonnet is the only supported LLM
- Compatible with any Selenium-supported browser.
- Fills forms using user-provided JSON data.
- Accepts runtime instructions to dynamically adjust browsing strategies and actions.
- TODO: Create training datasets from browsing sessions
- Integrate Claude 3.5 Sonnet as a
ActionPlanner
- Demonstrate successful
BrowserAgent
across a variety of tasks - Create Python SDK
- Handle tabbed browsing
- Handle data extraction from website
- Improve vertical scrolling behavior
- Improve horizontal scrolling behavior
- Improve system prompt performance
- Improve mouse position marking on screenshots
- Add ability for converting browser sessions into training datasets
- Support for additional LLMs as an
ActionPlanner
- Train a local model
- Integrate local model as a
ActionPlanner
- Claude 3.5 safety refusals
- Refuses to solve CAPTCHAs
- Refuses to navigate when political content is on the page
Contributions to Cerebellum are welcome. For details on how to get involved, please refer to our CONTRIBUTING.md.
We appreciate all contributions, whether they're bug reports, feature requests, or code changes.
This project is licensed under the MIT License.