WebCopilot is a multimodal LLM-based AI web automation agent powered by Puppeteer. It combines the power of large language models with browser automation to create intelligent, adaptable web automation scripts. By leveraging visual and textual understanding, WebCopilot can interpret web pages and perform actions more reliably than traditional selector-based automation tools.
You can install WebCopilot globally using npm:
npm install -g webcopilot
Or run it directly using npx:
npx webcopilot -s your-script.txt
WebCopilot requires a valid Anthropic API key to use the Claude LLM APIs. You can set up your API key using one of the following methods:
-
Environment Variable (recommended): Set the
ANTHROPIC_API_KEY
environment variable:For macOS:
# For temporary use in current terminal session export ANTHROPIC_API_KEY=your_api_key_here # For persistent use (add to your shell profile) echo 'export ANTHROPIC_API_KEY=your_api_key_here' >> ~/.zshrc source ~/.zshrc
For Linux:
# For temporary use in current terminal session export ANTHROPIC_API_KEY=your_api_key_here # For persistent use (add to your shell profile) echo 'export ANTHROPIC_API_KEY=your_api_key_here' >> ~/.bashrc source ~/.bashrc
For Windows Command Prompt:
# For temporary use in current session set ANTHROPIC_API_KEY=your_api_key_here # For persistent use setx ANTHROPIC_API_KEY your_api_key_here
For Windows PowerShell:
# For temporary use in current session $env:ANTHROPIC_API_KEY = "your_api_key_here" # For persistent use [Environment]::SetEnvironmentVariable("ANTHROPIC_API_KEY", "your_api_key_here", "User")
-
Command Line Argument: Provide the key directly when running WebCopilot:
npx webcopilot -s your-script.txt -k your_api_key_here
-
Configuration File: Add your API key to the
.webcopilot_config.yml
file in your project directory (see the Configuration section below).
WebCopilot uses simple text files containing natural language instructions to automate web interactions. Each line in the script represents a single action to be performed.
Create a .txt
file containing your automation steps. Each line should describe one action in natural language. For example:
navigate to https://example.com
type "search term" into the search box
click the "Submit" button
I should see the search results page
WebCopilot currently supports the following types of actions:
-
Navigate: Go to a specific URL
- Example:
navigate to https://example.com
- Example:
-
Type: Enter text into form fields
- Example:
type "Hello World" into the input field
- Example:
-
Click: Click on elements
- Example:
click the Submit button
- Example:
-
Expect: Verify elements or content is present
- Example:
I should see the login form
- Example:
Here's a complete example script that searches UCI's website:
navigate to https://uci.edu
type "Computer Science" into the search bar on the top right
click the "web" button
click the title of the first item in the search results
I should see the Department of Computer Science home page
npx webcopilot [options]
Options:
-s, --script <path> Path to script file (required)
-h, --headless Run in headless mode
-c, --chrome Use system installed Chrome
-k, --key <key> Override default API key
You can override the default configurations by creating a .webcopilot_config.yml
file in your project directory. Below are the available configuration options with their default values:
behavior:
headless: false # Run browser in headless mode (false shows the browser UI)
useChrome: false # Use system Chrome instead of bundled Chromium
keepAlive: false # Keep browser window open after script finishes executing
viewport:
width: 1024 # Browser viewport width
height: 1024 # Browser viewport height
network:
block: # Array of URLs to block (e.g., analytics)
- "*.googletagmanager.com/*"
- "*.google-analytics.com/*"
retry:
enabled: true # Enable retry mechanism
maxRetries: 3 # Maximum number of retry attempts
retryDelay: 5000 # Delay between retries in milliseconds
llm:
cache:
enabled: false # Enable LLM response caching
path: "llm_cache" # Path to cache directory
claude:
apiKey: "your-api-key-here" # Anthropic API key
model: "claude-3-5-sonnet-20241022" # Claude model to use
temperature: 0.7 # Model temperature (0-1)
maxTokens: 1024 # Maximum tokens in response
topP: 1 # Top P sampling parameter
topK: 1 # Top K sampling parameter
frequencyPenalty: 0 # Frequency penalty for token generation
presencePenalty: 0 # Presence penalty for token generation
Create a .webcopilot_config.yml
file in your project directory with any of the above settings to override the defaults. For example:
behavior:
headless: true # Run browser without GUI (headless mode)
useChrome: true # Use system Chrome browser instead of bundled Chromium
This will run the browser in non-headless mode using system Chrome with a larger viewport.
- Currently, the caching mechanism in WebCopilot only detects if the prompt is the same as previous calls. The attached snapshots are not being considered in the cache key.
- If you are expecting the webpage to dynamically change between runs, please don't enable the caching feature as it may return stale responses.
- We plan to implement image comparison for the caching feature in a future release.
You can test the package locally before publishing to NPM using two approaches:
# In your webcopilot project directory
npm run build # Build the TypeScript files
npm link # Create a global link
# Now you can use npx webcopilot from anywhere
npx webcopilot -s your-script.txt
# In your webcopilot project directory
npm run build # Build the TypeScript files
npx ./dist/index.js -s your-script.txt
To unlink the package when you're done testing:
npm unlink webcopilot
webcopilot/
├── src/ # Source files
├── dist/ # Compiled JavaScript files
├── config/ # Configuration files
│ └── default.yml # Default configuration
└── tests/ # Test files
Contributions are welcome! Please feel free to submit a pull request.
This project is open-sourced under the MIT License - see the LICENSE file for details.
- Xiaohan "Clement" Tian (GitHub, [email protected])
- Leo Huang (GitHub, [email protected])
- Ada Lo ([email protected])
- Tony Zhang ([email protected])
- Joel Thoms (GitHub, [email protected])
- Queensley Lim ([email protected])
- Jenny Noh ([email protected])
- William He ([email protected])
- Feiyang Jin ([email protected])
- Kiran Maya Sheikh ([email protected])
- Vianey Flores Mursio ([email protected])