A configurable web browser automation agent powered by Hugging Face's SmolaGent framework with support for multiple AI models including Gemini and Azure OpenAI. This agent can navigate websites and perform tasks based on natural language instructions.
- Configurable Tasks: Define and manage different automation tasks through configuration files
- Multiple AI Models: Support for Google's Gemini and Microsoft's Azure OpenAI models for intelligent decision-making
- Secure Credential Management: Safely store and use credentials for authenticated tasks
- Screenshot Capture: Visual feedback of browser state during automation
- Extensible Tools: Rich set of browser interaction tools that can be extended
-
Clone this repository:
git clone <repository-url> cd web_browser_agent
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up your environment variables: Create a
.env
file with the following:# For Gemini GOOGLE_API_KEY=your_gemini_api_key_here # For Azure OpenAI AZURE_API_KEY=your_azure_api_key_here AZURE_ENDPOINT=your_azure_endpoint_here AZURE_API_VERSION=your_azure_api_version_here AZURE_DEPLOYMENT=your_azure_deployment_name_here
The easiest way to use the agent is through the web interface:
python main.py --web
This launches a Gradio web interface where you can:
- Run tasks with a user-friendly UI
- View task results and screenshots
- Manage tasks and credentials
- Configure agent settings
To share the interface publicly (useful for demos or remote access):
python main.py --web --share
python main.py --task wikipedia_search
python main.py --description "Go to news.ycombinator.com and get the titles of the top 5 stories" --url "https://news.ycombinator.com"
python main.py --task amazon_toothbrush --credentials amazon
python main.py --task github_trending --headless
You can manage tasks using the task manager utility:
# Add a new task interactively
python -m utils.task_manager --add
# List all available tasks
python -m utils.task_manager --list
# Show details of a specific task
python -m utils.task_manager --show github_trending
# Remove a task
python -m utils.task_manager --remove outdated_task
You can manage credentials using the credential manager utility:
# Add credentials interactively
python -m utils.credential_manager --add amazon
# List all available credential IDs
python -m utils.credential_manager --list
# Remove credentials
python -m utils.credential_manager --remove old_account
The agent's behavior can be configured through the following files:
config/tasks.json
: Defines automation tasksconfig/credentials.json
: Stores credentials for authenticated tasksconfig/settings.json
: General settings for the agent
The repository comes with several example tasks:
- Amazon Toothbrush: Search for an electric toothbrush on Amazon and add it to cart
- Wikipedia Search: Find specific information on Wikipedia
- GitHub Trending: Check trending repositories on GitHub
- Weather Check: Get weather forecast for a specific location
- Python 3.8+
- Chrome browser
- Gemini API key or Azure OpenAI API key
- Gradio (for web interface)
- LiteLLM (for model integration)