Skip to content

zl0531/browser-use-ui

Repository files navigation

Web Browser Automation Agent

A configurable web browser automation agent powered by Hugging Face's SmolaGent framework with support for multiple AI models including Gemini and Azure OpenAI. This agent can navigate websites and perform tasks based on natural language instructions.

Features

  • Configurable Tasks: Define and manage different automation tasks through configuration files
  • Multiple AI Models: Support for Google's Gemini and Microsoft's Azure OpenAI models for intelligent decision-making
  • Secure Credential Management: Safely store and use credentials for authenticated tasks
  • Screenshot Capture: Visual feedback of browser state during automation
  • Extensible Tools: Rich set of browser interaction tools that can be extended

Installation

  1. Clone this repository:

    git clone <repository-url>
    cd web_browser_agent
    
  2. Install the required dependencies:

    pip install -r requirements.txt
    
  3. Set up your environment variables: Create a .env file with the following:

    # For Gemini
    GOOGLE_API_KEY=your_gemini_api_key_here
    
    # For Azure OpenAI
    AZURE_API_KEY=your_azure_api_key_here
    AZURE_ENDPOINT=your_azure_endpoint_here
    AZURE_API_VERSION=your_azure_api_version_here
    AZURE_DEPLOYMENT=your_azure_deployment_name_here
    

Usage

Web Interface

The easiest way to use the agent is through the web interface:

python main.py --web

This launches a Gradio web interface where you can:

  • Run tasks with a user-friendly UI
  • View task results and screenshots
  • Manage tasks and credentials
  • Configure agent settings

To share the interface publicly (useful for demos or remote access):

python main.py --web --share

Command Line Usage

Running a Predefined Task

python main.py --task wikipedia_search

Running a Custom Task

python main.py --description "Go to news.ycombinator.com and get the titles of the top 5 stories" --url "https://news.ycombinator.com"

Running with Credentials

python main.py --task amazon_toothbrush --credentials amazon

Running in Headless Mode

python main.py --task github_trending --headless

Managing Tasks

You can manage tasks using the task manager utility:

# Add a new task interactively
python -m utils.task_manager --add

# List all available tasks
python -m utils.task_manager --list

# Show details of a specific task
python -m utils.task_manager --show github_trending

# Remove a task
python -m utils.task_manager --remove outdated_task

Managing Credentials

You can manage credentials using the credential manager utility:

# Add credentials interactively
python -m utils.credential_manager --add amazon

# List all available credential IDs
python -m utils.credential_manager --list

# Remove credentials
python -m utils.credential_manager --remove old_account

Configuration

The agent's behavior can be configured through the following files:

  • config/tasks.json: Defines automation tasks
  • config/credentials.json: Stores credentials for authenticated tasks
  • config/settings.json: General settings for the agent

Example Tasks

The repository comes with several example tasks:

  1. Amazon Toothbrush: Search for an electric toothbrush on Amazon and add it to cart
  2. Wikipedia Search: Find specific information on Wikipedia
  3. GitHub Trending: Check trending repositories on GitHub
  4. Weather Check: Get weather forecast for a specific location

Requirements

  • Python 3.8+
  • Chrome browser
  • Gemini API key or Azure OpenAI API key
  • Gradio (for web interface)
  • LiteLLM (for model integration)

License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages