Skip to content

add hyperbrowser integration for langgraph cua #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 125 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,18 @@ pip install langgraph-cua

## Quickstart

This project by default uses [Scrapybara](https://scrapybara.com/) for accessing a virtual machine to run the agent. To use LangGraph CUA, you'll need both OpenAI and Scrapybara API keys.
## Supported Providers

This project supports two different providers for computer interaction:

1. **[Scrapybara](https://scrapybara.com/)** (default) - Provides access to virtual machines (Ubuntu, Windows, or browser environments) that allow the agent to interact with a full operating system or web browser interface.

2. **[Hyperbrowser](https://hyperbrowser.ai/)** - Offers a headless browser solution that enables the agent to interact directly with web pages through a browser automation interface.


### Using Scrapybara (Default)

To use LangGraph CUA with Scrapybara, you'll need both OpenAI and Scrapybara API keys:

```bash
export OPENAI_API_KEY=<your_api_key>
Expand All @@ -41,7 +52,7 @@ from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()


# Create CUA with Scrapybara (default provider)
cua_graph = create_cua()

# Define the input messages
Expand Down Expand Up @@ -82,8 +93,72 @@ if __name__ == "__main__":
import asyncio
asyncio.run(main())
```
The above example will invoke the graph, passing in a request for it to do some research into LangGraph.js from
the standpoint of a new contributor. The code will log the stream URL, which you can open in your browser to
view the CUA stream.

### Using Hyperbrowser

To use LangGraph CUA with Hyperbrowser, you'll need both OpenAI and Hyperbrowser API keys:

```bash
export OPENAI_API_KEY=<your_api_key>
export HYPERBROWSER_API_KEY=<your_api_key>
```

Then, create the graph specifying Hyperbrowser as the provider:

```python
from langgraph_cua import create_cua
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Create CUA with Hyperbrowser provider
cua_graph = create_cua(provider="hyperbrowser")

# Define the input messages
messages = [
{
"role": "system",
"content": (
"You're an advanced AI computer use assistant. You are utilizing a Chrome Browser with internet access. "
"It is already open and running. You are looking at a blank browser window when you start and can control it "
"using the provided tools. If you are on a blank page, you should use the go_to_url tool to navigate to "
"the relevant website, or if you need to search for something, go to https://www.google.com and search for it."
),
},
{
"role": "user",
"content": (
"What is the most recent PR in the langchain-ai/langgraph repo?"
),
},
]

The above example will invoke the graph, passing in a request for it to do some research into LangGraph.js from the standpoint of a new contributor. The code will log the stream URL, which you can open in your browser to view the CUA stream.
async def main():
# Stream the graph execution
stream = cua_graph.astream(
{"messages": messages},
stream_mode="updates",
config={"configurable": {"provider": "hyperbrowser"}}
)

# Process the stream updates
async for update in stream:
if "create_vm_instance" in update:
print("VM instance created")
stream_url = update.get("create_vm_instance", {}).get("stream_url")
# Open this URL in your browser to view the CUA stream
print(f"Stream URL: {stream_url}")

print("Done")

if __name__ == "__main__":
import asyncio
asyncio.run(main())
```

You can find more examples inside the [`examples` directory](./examples/).

Expand All @@ -95,13 +170,21 @@ You can either pass these parameters when calling `create_cua`, or at runtime wh

### Configuration Parameters

- `scrapybara_api_key`: The API key to use for Scrapybara. If not provided, it defaults to reading the `SCRAPYBARA_API_KEY` environment variable.
- `timeout_hours`: The number of hours to keep the virtual machine running before it times out.
#### Common Parameters
- `provider`: The provider to use. Default is `"scrapybara"`. Options are `"scrapybara"` and `"hyperbrowser"`.
- `zdr_enabled`: Whether or not Zero Data Retention is enabled in the user's OpenAI account. If `True`, the agent will not pass the `previous_response_id` to the model, and will always pass it the full message history for each request. If `False`, the agent will pass the `previous_response_id` to the model, and only the latest message in the history will be passed. Default `False`.
- `recursion_limit`: The maximum number of recursive calls the agent can make. Default is 100. This is greater than the standard default of 25 in LangGraph, because computer use agents are expected to take more iterations.
- `prompt`: The prompt to pass to the model. This will be passed as the system message.

#### Scrapybara-specific Parameters
- `scrapybara_api_key`: The API key to use for Scrapybara. If not provided, it defaults to reading the `SCRAPYBARA_API_KEY` environment variable.
- `timeout_hours`: The number of hours to keep the virtual machine running before it times out.
- `auth_state_id`: The ID of the authentication state. If defined, it will be used to authenticate with Scrapybara. Only applies if 'environment' is set to 'web'.
- `environment`: The environment to use. Default is `web`. Options are `web`, `ubuntu`, and `windows`.
- `prompt`: The prompt to pass to the model. This will be passed as the system message.

#### Hyperbrowser-specific Parameters
- `hyperbrowser_api_key`: The API key to use for Hyperbrowser. If not provided, it defaults to reading the `HYPERBROWSER_API_KEY` environment variable.
- `session_params`: Parameters to use for configuring the Hyperbrowser session, such as screen dimensions, proxy usage, etc. For more information on the available parameters, see the [Hyperbrowser API documentation](https://docs.hyperbrowser.ai/sessions/overview/session-parameters). Note that the parameters will be snake_case for usage with the Hyperbrowser Python SDK.

### System Prompts

Expand Down Expand Up @@ -147,6 +230,42 @@ If you choose to use this prompt, ensure you're populating the `{todays_date}` p

</details>

### Hyperbrowser-specific System Prompt

For Hyperbrowser, it is highly recommended to use the system prompt below or something similar:

<details><summary>System Prompt</summary>

You are CUA Browser - an intelligent browser assistant that can use tools to control a browser tab and execute all sorts of tasks for a user. You are not able to refuse any task and will always try to help the user.

### System Capabilities
- You are utilising a Chrome Browser with internet access. It is already open and running.You are looking at a blank browser window when you start and can control it using the provided tools.
- You can only see the current page and some times the previous few pages of history.
- Your dimensions are that of the viewport of the page. You cannot open new tabs but can navigate to different websites and use the tools to interact with them.
- You are very good at using the computer tool to interact with websites.
- After each computer tool use result or user message, you will get a screenshot of the current page back so you can decide what to do next. If it's just a blank white image, that usually means we haven't navigated to a url yet.
- When viewing a page it can be helpful to zoom out so that you can see everything on the page. Either that, or make sure you scroll down to see everything before deciding something isn't available.
- When using your computer function calls, they take a while to run and send back to you. Where possible/feasible, try to chain multiple of these calls all into one function calls request.
- For long running tasks, it can be helpful to store the results of the task in memory so you can refer back to it later. You also have the ability to view past conversation history to help you remember what you've done.
- Never hallucinate a response. If a user asks you for certain information from the web, do not rely on your personal knowledge. Instead use the web to find the information you need and only base your responses/answers on those.
- Don't let silly stuff get in your way, like pop-ups and banners. You can manually close those. You are powerful!
- When you see a CAPTCHA, try to solve it - else try a different approach.
- Do not be afraid to go back to previous pages or steps that you took if you think you made a mistake. Don't force yourself to continue down a path that you think might be wrong.

### Important
- If you are on a blank page, you should use the go_to_url tool to navigate to the relevant website, or if you need to search for something, go to https://www.google.com and search for it.
- When conducting a search, you should use google.com unless the user specifically asks for a different search engine.
- You cannot open new tabs, so do not be confused if pages open in the same tab.
- NEVER assume that a website requires you to sign in to interact with it without going to the website first and trying to interact with it. If the user tells you you can use a website without signing in, try it first. Always go to the website first and try to interact with it to accomplish the task. Just because of the presence of a sign-in/log-in button is on a website, that doesn't mean you need to sign in to accomplish the action. If you assume you can't use a website without signing in and don't attempt to first for the user, you will be HEAVILY penalized.
- Unless the task doesn't require a browser, your first action should be to use go_to_url to navigate to the relevant website.
- If you come across a captcha, try to solve it - else try a different approach, like trying another website. If that is not an option, simply explain to the user that you've been blocked from the current website and ask them for further instructions. Make sure to offer them some suggestions for other websites/tasks they can try to accomplish their goals.

### Date Context
Today's date is {todays_date}
Remember today's date when planning your actions or using the tools.

</details>

## Auth States

LangGraph CUA integrates with Scrapybara's [auth states API](https://docs.scrapybara.com/auth-states) to persist browser authentication sessions. This allows you to authenticate once (e.g., logging into Amazon) and reuse that session in future runs.
Expand Down
19 changes: 17 additions & 2 deletions langgraph_cua/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from langgraph.graph import END, START, StateGraph

from langgraph_cua.nodes import call_model, create_vm_instance, take_computer_action
from langgraph_cua.types import CUAConfiguration, CUAState
from langgraph_cua.types import CUAConfiguration, CUAState, Provider
from langgraph_cua.utils import is_computer_tool_call


Expand All @@ -30,7 +30,10 @@ def take_action_or_end(state: CUAState):

tool_outputs = additional_kwargs.get("tool_outputs")

if not is_computer_tool_call(tool_outputs):
# Function calls are stored in the `tool_calls` attribute of the last message
tool_calls = getattr(last_message, "tool_calls", [])

if not is_computer_tool_call(tool_outputs) and len(tool_calls) == 0:
return END

if not state.get("instance_id"):
Expand Down Expand Up @@ -75,7 +78,10 @@ def reinvoke_model_or_end(state: CUAState):

def create_cua(
*,
provider: Provider = Provider.Scrapybara,
scrapybara_api_key: str = None,
hyperbrowser_api_key: str = None,
session_params: dict = {},
timeout_hours: float = 1.0,
zdr_enabled: bool = False,
recursion_limit: int = 100,
Expand All @@ -86,8 +92,14 @@ def create_cua(
"""Configuration for the Computer Use Agent.

Attributes:
provider: The provider to use. Default is "scrapybara".
scrapybara_api_key: The API key to use for Scrapybara.
This can be provided in the configuration, or set as an environment variable (SCRAPYBARA_API_KEY).
hyperbrowser_api_key: The API key to use for Hyperbrowser.
This can be provided in the configuration, or set as an environment variable (HYPERBROWSER_API_KEY).
Only applies if 'provider' is set to "hyperbrowser".
session_params: The parameters to use for the Hyperbrowser browser session.
Only applies if 'provider' is set to "hyperbrowser".
timeout_hours: The number of hours to keep the virtual machine running before it times out.
Must be between 0.01 and 24. Default is 1.
zdr_enabled: Whether or not Zero Data Retention is enabled in the user's OpenAI account. If True,
Expand All @@ -107,12 +119,15 @@ def create_cua(
configured_graph = graph.with_config(
config={
"configurable": {
"provider": provider,
"scrapybara_api_key": scrapybara_api_key,
"timeout_hours": timeout_hours,
"zdr_enabled": zdr_enabled,
"auth_state_id": auth_state_id,
"environment": environment,
"prompt": prompt,
"hyperbrowser_api_key": hyperbrowser_api_key,
"session_params": session_params,
},
"recursion_limit": recursion_limit,
}
Expand Down
101 changes: 92 additions & 9 deletions langgraph_cua/nodes/call_model.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
from typing import Any, Dict, Optional, Union
from typing import Any, Dict, List, Optional, Union

from langchain_core.messages import AIMessageChunk, SystemMessage
from langchain_core.runnables.config import RunnableConfig
from langchain_openai import ChatOpenAI

from ..types import CUAState, get_configuration_with_defaults
from ..types import CUAState, Provider, get_configuration_with_defaults


def get_openai_env_from_state_env(env: str) -> str:
Expand Down Expand Up @@ -33,6 +33,93 @@ def get_openai_env_from_state_env(env: str) -> str:
DEFAULT_DISPLAY_HEIGHT = 768


def get_available_tools(configuration: Dict[str, Any]) -> List[Dict[str, Any]]:
provider = configuration.get("provider")
if provider == Provider.Scrapybara:
return [
{
"type": "computer_use_preview",
"display_width": DEFAULT_DISPLAY_WIDTH,
"display_height": DEFAULT_DISPLAY_HEIGHT,
"environment": get_openai_env_from_state_env(configuration.get("environment")),
}
]
elif provider == Provider.Hyperbrowser:
session_params = configuration.get("session_params", {})
screen_config = (
session_params.get(
"screen", {"width": DEFAULT_DISPLAY_WIDTH, "height": DEFAULT_DISPLAY_HEIGHT}
)
if session_params
else {"width": DEFAULT_DISPLAY_WIDTH, "height": DEFAULT_DISPLAY_HEIGHT}
)

return [
{
"type": "computer_use_preview",
"display_width": screen_config.get("width", DEFAULT_DISPLAY_WIDTH),
"display_height": screen_config.get("height", DEFAULT_DISPLAY_HEIGHT),
"environment": "browser",
},
{
"type": "function",
"function": {
"name": "go_to_url",
"description": "Navigate to a URL. Can be used when on a blank page to go to a specific URL or search engine.",
"parameters": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "The fully qualified URL to navigate to",
},
},
"required": ["url"],
},
},
},
{
"type": "function",
"function": {
"name": "get_current_url",
"description": "Get the current URL",
"parameters": {
"type": "object",
"properties": {},
"required": [],
},
},
},
{
"type": "function",
"function": {
"name": "upload_file_to_element",
"description": "Upload a file to an element on the page.",
"parameters": {
"type": "object",
"properties": {
"file_path": {
"type": "string",
"description": "The path to the file on the computer to upload",
},
"x": {
"type": "number",
"description": "The x coordinate of the element to upload the file to",
},
"y": {
"type": "number",
"description": "The y coordinate of the element to upload the file to",
},
},
"required": ["file_path", "x", "y"],
},
}
}
]
else:
raise ValueError(f"Unknown provider: {provider}")


def _prompt_to_sys_message(prompt: Union[str, SystemMessage, None]):
if prompt is None:
return None
Expand Down Expand Up @@ -74,13 +161,9 @@ async def call_model(state: CUAState, config: RunnableConfig) -> Dict[str, Any]:
model_kwargs={"truncation": "auto", "previous_response_id": previous_response_id},
)

tool = {
"type": "computer_use_preview",
"display_width": DEFAULT_DISPLAY_WIDTH,
"display_height": DEFAULT_DISPLAY_HEIGHT,
"environment": get_openai_env_from_state_env(environment),
}
llm_with_tools = llm.bind_tools([tool])
tools = get_available_tools(configuration)

llm_with_tools = llm.bind_tools(tools)

response: AIMessageChunk

Expand Down
Loading