Web Browser Automation Agent

A configurable web browser automation agent powered by Hugging Face's SmolaGent framework with support for multiple AI models including Gemini and Azure OpenAI. This agent can navigate websites and perform tasks based on natural language instructions.

Features

Configurable Tasks: Define and manage different automation tasks through configuration files
Multiple AI Models: Support for Google's Gemini and Microsoft's Azure OpenAI models for intelligent decision-making
Secure Credential Management: Safely store and use credentials for authenticated tasks
Screenshot Capture: Visual feedback of browser state during automation
Extensible Tools: Rich set of browser interaction tools that can be extended

Installation

Clone this repository:

git clone <repository-url>
cd web_browser_agent

Install the required dependencies:
```
pip install -r requirements.txt
```

Set up your environment variables: Create a .env file with the following:

# For Gemini
GOOGLE_API_KEY=your_gemini_api_key_here

# For Azure OpenAI
AZURE_API_KEY=your_azure_api_key_here
AZURE_ENDPOINT=your_azure_endpoint_here
AZURE_API_VERSION=your_azure_api_version_here
AZURE_DEPLOYMENT=your_azure_deployment_name_here

Usage

Web Interface

The easiest way to use the agent is through the web interface:

python main.py --web

This launches a Gradio web interface where you can:

Run tasks with a user-friendly UI
View task results and screenshots
Manage tasks and credentials
Configure agent settings

To share the interface publicly (useful for demos or remote access):

python main.py --web --share

Command Line Usage

Running a Predefined Task

python main.py --task wikipedia_search

Running a Custom Task

python main.py --description "Go to news.ycombinator.com and get the titles of the top 5 stories" --url "https://news.ycombinator.com"

Running with Credentials

python main.py --task amazon_toothbrush --credentials amazon

Running in Headless Mode

python main.py --task github_trending --headless

Managing Tasks

You can manage tasks using the task manager utility:

# Add a new task interactively
python -m utils.task_manager --add

# List all available tasks
python -m utils.task_manager --list

# Show details of a specific task
python -m utils.task_manager --show github_trending

# Remove a task
python -m utils.task_manager --remove outdated_task

Managing Credentials

You can manage credentials using the credential manager utility:

# Add credentials interactively
python -m utils.credential_manager --add amazon

# List all available credential IDs
python -m utils.credential_manager --list

# Remove credentials
python -m utils.credential_manager --remove old_account

Configuration

The agent's behavior can be configured through the following files:

config/tasks.json: Defines automation tasks
config/credentials.json: Stores credentials for authenticated tasks
config/settings.json: General settings for the agent

Example Tasks

The repository comes with several example tasks:

Amazon Toothbrush: Search for an electric toothbrush on Amazon and add it to cart
Wikipedia Search: Find specific information on Wikipedia
GitHub Trending: Check trending repositories on GitHub
Weather Check: Get weather forecast for a specific location

Requirements

Python 3.8+
Chrome browser
Gemini API key or Azure OpenAI API key
Gradio (for web interface)
LiteLLM (for model integration)

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
config		config
utils		utils
README.md		README.md
browser_agent.py		browser_agent.py
launch_web.py		launch_web.py
main.py		main.py
requirements.txt		requirements.txt
test_agent.py		test_agent.py
test_models.py		test_models.py
web_interface.py		web_interface.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Browser Automation Agent

Features

Installation

Usage

Web Interface

Command Line Usage

Running a Predefined Task

Running a Custom Task

Running with Credentials

Running in Headless Mode

Managing Tasks

Managing Credentials

Configuration

Example Tasks

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

zl0531/browser-use-ui

Folders and files

Latest commit

History

Repository files navigation

Web Browser Automation Agent

Features

Installation

Usage

Web Interface

Command Line Usage

Running a Predefined Task

Running a Custom Task

Running with Credentials

Running in Headless Mode

Managing Tasks

Managing Credentials

Configuration

Example Tasks

Requirements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages