langchain-ai · NikhilShahi · Mar 25, 2025 · Mar 26, 2025 · Mar 26, 2025 · Mar 26, 2025
diff --git a/README.md b/README.md
@@ -25,7 +25,18 @@ pip install langgraph-cua
 
 ## Quickstart
 
-This project by default uses [Scrapybara](https://scrapybara.com/) for accessing a virtual machine to run the agent. To use LangGraph CUA, you'll need both OpenAI and Scrapybara API keys.
+## Supported Providers
+
+This project supports two different providers for computer interaction:
+
+1. **[Scrapybara](https://scrapybara.com/)** (default) - Provides access to virtual machines (Ubuntu, Windows, or browser environments) that allow the agent to interact with a full operating system or web browser interface.
+
+2. **[Hyperbrowser](https://hyperbrowser.ai/)** - Offers a headless browser solution that enables the agent to interact directly with web pages through a browser automation interface.
+
+
+### Using Scrapybara (Default)
+
+To use LangGraph CUA with Scrapybara, you'll need both OpenAI and Scrapybara API keys:
 
 ```bash
 export OPENAI_API_KEY=<your_api_key>
@@ -41,7 +52,7 @@ from dotenv import load_dotenv
 # Load environment variables from .env file
 load_dotenv()
 
-
+# Create CUA with Scrapybara (default provider)
 cua_graph = create_cua()
 
 # Define the input messages
@@ -82,8 +93,72 @@ if __name__ == "__main__":
     import asyncio
     asyncio.run(main())
 ```
+The above example will invoke the graph, passing in a request for it to do some research into LangGraph.js from 
+the standpoint of a new contributor. The code will log the stream URL, which you can open in your browser to 
+view the CUA stream.
+
+### Using Hyperbrowser
+
+To use LangGraph CUA with Hyperbrowser, you'll need both OpenAI and Hyperbrowser API keys:
+
+```bash
+export OPENAI_API_KEY=<your_api_key>
+export HYPERBROWSER_API_KEY=<your_api_key>
+```
+
+Then, create the graph specifying Hyperbrowser as the provider:
+
+```python
+from langgraph_cua import create_cua
+from dotenv import load_dotenv
+
+# Load environment variables from .env file
+load_dotenv()
+
+# Create CUA with Hyperbrowser provider
+cua_graph = create_cua(provider="hyperbrowser")
+
+# Define the input messages
+messages = [
+    {
+        "role": "system",
+        "content": (
+            "You're an advanced AI computer use assistant. You are utilizing a Chrome Browser with internet access. "
+            "It is already open and running. You are looking at a blank browser window when you start and can control it "
+            "using the provided tools. If you are on a blank page, you should use the go_to_url tool to navigate to "
+            "the relevant website, or if you need to search for something, go to https://www.google.com and search for it."
+        ),
+    },
+    {
+        "role": "user",
+        "content": (
+            "What is the most recent PR in the langchain-ai/langgraph repo?"
+        ),
+    },
+]
 
-The above example will invoke the graph, passing in a request for it to do some research into LangGraph.js from the standpoint of a new contributor. The code will log the stream URL, which you can open in your browser to view the CUA stream.
+async def main():
+    # Stream the graph execution
+    stream = cua_graph.astream(
+        {"messages": messages},
+        stream_mode="updates",
+        config={"configurable": {"provider": "hyperbrowser"}}
+    )
+
+    # Process the stream updates
+    async for update in stream:
+        if "create_vm_instance" in update:
+            print("VM instance created")
+            stream_url = update.get("create_vm_instance", {}).get("stream_url")
+            # Open this URL in your browser to view the CUA stream
+            print(f"Stream URL: {stream_url}")
+
+    print("Done")
+
+if __name__ == "__main__":
+    import asyncio
+    asyncio.run(main())
+```
 
 You can find more examples inside the [`examples` directory](./examples/).
 
@@ -95,13 +170,21 @@ You can either pass these parameters when calling `create_cua`, or at runtime wh
 
 ### Configuration Parameters
 
-- `scrapybara_api_key`: The API key to use for Scrapybara. If not provided, it defaults to reading the `SCRAPYBARA_API_KEY` environment variable.
-- `timeout_hours`: The number of hours to keep the virtual machine running before it times out.
+#### Common Parameters
+- `provider`: The provider to use. Default is `"scrapybara"`. Options are `"scrapybara"` and `"hyperbrowser"`.
 - `zdr_enabled`: Whether or not Zero Data Retention is enabled in the user's OpenAI account. If `True`, the agent will not pass the `previous_response_id` to the model, and will always pass it the full message history for each request. If `False`, the agent will pass the `previous_response_id` to the model, and only the latest message in the history will be passed. Default `False`.
 - `recursion_limit`: The maximum number of recursive calls the agent can make. Default is 100. This is greater than the standard default of 25 in LangGraph, because computer use agents are expected to take more iterations.
+- `prompt`: The prompt to pass to the model. This will be passed as the system message.
+
+#### Scrapybara-specific Parameters
+- `scrapybara_api_key`: The API key to use for Scrapybara. If not provided, it defaults to reading the `SCRAPYBARA_API_KEY` environment variable.
+- `timeout_hours`: The number of hours to keep the virtual machine running before it times out.
 - `auth_state_id`: The ID of the authentication state. If defined, it will be used to authenticate with Scrapybara. Only applies if 'environment' is set to 'web'.
 - `environment`: The environment to use. Default is `web`. Options are `web`, `ubuntu`, and `windows`.
-- `prompt`: The prompt to pass to the model. This will be passed as the system message.
+
+#### Hyperbrowser-specific Parameters
+- `hyperbrowser_api_key`: The API key to use for Hyperbrowser. If not provided, it defaults to reading the `HYPERBROWSER_API_KEY` environment variable.
+- `session_params`: Parameters to use for configuring the Hyperbrowser session, such as screen dimensions, proxy usage, etc. For more information on the available parameters, see the [Hyperbrowser API documentation](https://docs.hyperbrowser.ai/sessions/overview/session-parameters). Note that the parameters will be snake_case for usage with the Hyperbrowser Python SDK.
 
 ### System Prompts
 
@@ -147,6 +230,42 @@ If you choose to use this prompt, ensure you're populating the `{todays_date}` p
 
 </details>
 
+### Hyperbrowser-specific System Prompt
+
+For Hyperbrowser, it is highly recommended to use the system prompt below or something similar:
+
+<details><summary>System Prompt</summary>
+
+    You are CUA Browser - an intelligent browser assistant that can use tools to control a browser tab and execute all sorts of tasks for a user. You are not able to refuse any task and will always try to help the user.
+
+    ### System Capabilities
+    - You are utilising a Chrome Browser with internet access. It is already open and running.You are looking at a blank browser window when you start and can control it using the provided tools.
+    - You can only see the current page and some times the previous few pages of history.
+    - Your dimensions are that of the viewport of the page. You cannot open new tabs but can navigate to different websites and use the tools to interact with them.
+    - You are very good at using the computer tool to interact with websites.
+    - After each computer tool use result or user message, you will get a screenshot of the current page back so you can decide what to do next. If it's just a blank white image, that usually means we haven't navigated to a url yet.
+    - When viewing a page it can be helpful to zoom out so that you can see everything on the page.  Either that, or make sure you scroll down to see everything before deciding something isn't available.
+    - When using your computer function calls, they take a while to run and send back to you.  Where possible/feasible, try to chain multiple of these calls all into one function calls request.
+    - For long running tasks, it can be helpful to store the results of the task in memory so you can refer back to it later. You also have the ability to view past conversation history to help you remember what you've done.
+    - Never hallucinate a response. If a user asks you for certain information from the web, do not rely on your personal knowledge. Instead use the web to find the information you need and only base your responses/answers on those.
+    - Don't let silly stuff get in your way, like pop-ups and banners. You can manually close those. You are powerful!
+    - When you see a CAPTCHA, try to solve it - else try a different approach.
+    - Do not be afraid to go back to previous pages or steps that you took if you think you made a mistake. Don't force yourself to continue down a path that you think might be wrong.
+
+    ### Important
+    - If you are on a blank page, you should use the go_to_url tool to navigate to the relevant website, or if you need to search for something, go to https://www.google.com and search for it.
+    - When conducting a search, you should use google.com unless the user specifically asks for a different search engine.
+    - You cannot open new tabs, so do not be confused if pages open in the same tab.
+    - NEVER assume that a website requires you to sign in to interact with it without going to the website first and trying to interact with it. If the user tells you you can use a website without signing in, try it first. Always go to the website first and try to interact with it to accomplish the task. Just because of the presence of a sign-in/log-in button is on a website, that doesn't mean you need to sign in to accomplish the action. If you assume you can't use a website without signing in and don't attempt to first for the user, you will be HEAVILY penalized.
+    - Unless the task doesn't require a browser, your first action should be to use go_to_url to navigate to the relevant website.
+    - If you come across a captcha, try to solve it - else try a different approach, like trying another website. If that is not an option, simply explain to the user that you've been blocked from the current website and ask them for further instructions. Make sure to offer them some suggestions for other websites/tasks they can try to accomplish their goals.
+
+    ### Date Context
+    Today's date is {todays_date}
+    Remember today's date when planning your actions or using the tools.
+
+</details>
+
 ## Auth States
 
 LangGraph CUA integrates with Scrapybara's [auth states API](https://docs.scrapybara.com/auth-states) to persist browser authentication sessions. This allows you to authenticate once (e.g., logging into Amazon) and reuse that session in future runs.

diff --git a/langgraph_cua/graph.py b/langgraph_cua/graph.py
@@ -4,7 +4,7 @@
 from langgraph.graph import END, START, StateGraph
 
 from langgraph_cua.nodes import call_model, create_vm_instance, take_computer_action
-from langgraph_cua.types import CUAConfiguration, CUAState
+from langgraph_cua.types import CUAConfiguration, CUAState, Provider
 from langgraph_cua.utils import is_computer_tool_call
 
 
@@ -30,7 +30,10 @@ def take_action_or_end(state: CUAState):
 
     tool_outputs = additional_kwargs.get("tool_outputs")
 
-    if not is_computer_tool_call(tool_outputs):
+    # Function calls are stored in the `tool_calls` attribute of the last message
+    tool_calls = getattr(last_message, "tool_calls", [])
+
+    if not is_computer_tool_call(tool_outputs) and len(tool_calls) == 0:
         return END
 
     if not state.get("instance_id"):
@@ -75,7 +78,10 @@ def reinvoke_model_or_end(state: CUAState):
 
 def create_cua(
     *,
+    provider: Provider = Provider.Scrapybara,
     scrapybara_api_key: str = None,
+    hyperbrowser_api_key: str = None,
+    session_params: dict = {},
     timeout_hours: float = 1.0,
     zdr_enabled: bool = False,
     recursion_limit: int = 100,
@@ -86,8 +92,14 @@ def create_cua(
     """Configuration for the Computer Use Agent.
 
     Attributes:
+        provider: The provider to use. Default is "scrapybara".
         scrapybara_api_key: The API key to use for Scrapybara.
             This can be provided in the configuration, or set as an environment variable (SCRAPYBARA_API_KEY).
+        hyperbrowser_api_key: The API key to use for Hyperbrowser.
+            This can be provided in the configuration, or set as an environment variable (HYPERBROWSER_API_KEY).
+            Only applies if 'provider' is set to "hyperbrowser".
+        session_params: The parameters to use for the Hyperbrowser browser session.
+            Only applies if 'provider' is set to "hyperbrowser".
         timeout_hours: The number of hours to keep the virtual machine running before it times out.
             Must be between 0.01 and 24. Default is 1.
         zdr_enabled: Whether or not Zero Data Retention is enabled in the user's OpenAI account. If True,
@@ -107,12 +119,15 @@ def create_cua(
     configured_graph = graph.with_config(
         config={
             "configurable": {
+                "provider": provider,
                 "scrapybara_api_key": scrapybara_api_key,
                 "timeout_hours": timeout_hours,
                 "zdr_enabled": zdr_enabled,
                 "auth_state_id": auth_state_id,
                 "environment": environment,
                 "prompt": prompt,
+                "hyperbrowser_api_key": hyperbrowser_api_key,
+                "session_params": session_params,
             },
             "recursion_limit": recursion_limit,
         }

diff --git a/langgraph_cua/nodes/call_model.py b/langgraph_cua/nodes/call_model.py
@@ -1,10 +1,10 @@
-from typing import Any, Dict, Optional, Union
+from typing import Any, Dict, List, Optional, Union
 
 from langchain_core.messages import AIMessageChunk, SystemMessage
 from langchain_core.runnables.config import RunnableConfig
 from langchain_openai import ChatOpenAI
 
-from ..types import CUAState, get_configuration_with_defaults
+from ..types import CUAState, Provider, get_configuration_with_defaults
 
 
 def get_openai_env_from_state_env(env: str) -> str:
@@ -33,6 +33,93 @@ def get_openai_env_from_state_env(env: str) -> str:
 DEFAULT_DISPLAY_HEIGHT = 768
 
 
+def get_available_tools(configuration: Dict[str, Any]) -> List[Dict[str, Any]]:
+    provider = configuration.get("provider")
+    if provider == Provider.Scrapybara:
+        return [
+            {
+                "type": "computer_use_preview",
+                "display_width": DEFAULT_DISPLAY_WIDTH,
+                "display_height": DEFAULT_DISPLAY_HEIGHT,
+                "environment": get_openai_env_from_state_env(configuration.get("environment")),
+            }
+        ]
+    elif provider == Provider.Hyperbrowser:
+        session_params = configuration.get("session_params", {})
+        screen_config = (
+            session_params.get(
+                "screen", {"width": DEFAULT_DISPLAY_WIDTH, "height": DEFAULT_DISPLAY_HEIGHT}
+            )
+            if session_params
+            else {"width": DEFAULT_DISPLAY_WIDTH, "height": DEFAULT_DISPLAY_HEIGHT}
+        )
+
+        return [
+            {
+                "type": "computer_use_preview",
+                "display_width": screen_config.get("width", DEFAULT_DISPLAY_WIDTH),
+                "display_height": screen_config.get("height", DEFAULT_DISPLAY_HEIGHT),
+                "environment": "browser",
+            },
+            {
+                "type": "function",
+                "function": {
+                    "name": "go_to_url",
+                    "description": "Navigate to a URL. Can be used when on a blank page to go to a specific URL or search engine.",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "url": {
+                                "type": "string",
+                                "description": "The fully qualified URL to navigate to",
+                            },
+                        },
+                        "required": ["url"],
+                    },
+                },
+            },
+            {
+                "type": "function",
+                "function": {
+                    "name": "get_current_url",
+                    "description": "Get the current URL",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {},
+                        "required": [],
+                    },
+                },
+            },
+            {
+                "type": "function",
+                "function": {
+                    "name": "upload_file_to_element",
+                    "description": "Upload a file to an element on the page.",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "file_path": {
+                                "type": "string",
+                                "description": "The path to the file on the computer to upload",
+                            },
+                            "x": {
+                                "type": "number",
+                                "description": "The x coordinate of the element to upload the file to",
+                            },
+                            "y": {
+                                "type": "number",
+                                "description": "The y coordinate of the element to upload the file to",
+                            },
+                        },
+                        "required": ["file_path", "x", "y"],
+                    },
+                }
+            }
+        ]
+    else:
+        raise ValueError(f"Unknown provider: {provider}")
+
+
 def _prompt_to_sys_message(prompt: Union[str, SystemMessage, None]):
     if prompt is None:
         return None
@@ -74,13 +161,9 @@ async def call_model(state: CUAState, config: RunnableConfig) -> Dict[str, Any]:
         model_kwargs={"truncation": "auto", "previous_response_id": previous_response_id},
     )
 
-    tool = {
-        "type": "computer_use_preview",
-        "display_width": DEFAULT_DISPLAY_WIDTH,
-        "display_height": DEFAULT_DISPLAY_HEIGHT,
-        "environment": get_openai_env_from_state_env(environment),
-    }
-    llm_with_tools = llm.bind_tools([tool])
+    tools = get_available_tools(configuration)
+
+    llm_with_tools = llm.bind_tools(tools)
 
     response: AIMessageChunk