diff --git a/CHANGELOG.md b/CHANGELOG.md index 88211b877..8f9aec49b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,8 @@ ## Unreleased +- Rename `web_browser_tools()` to `web_browser()`, and don't export individual web browsing tools. +- Add `parallel` option to `@tool` decorator and specify `parallel=False` for web browsing tools. - Improve prompting for web browser tools using more explicit examples. - Improve prompting for `` end sequence for Llama models. - Fix issue with failure to execute sample setup scripts. diff --git a/docs/tools.qmd b/docs/tools.qmd index c708df6bc..8094957a9 100644 --- a/docs/tools.qmd +++ b/docs/tools.qmd @@ -219,6 +219,18 @@ my_add = tool_with(add(), parameters={"x": "the x argument"}) Note that the `tool_with()` function returns a copy of the passed tool with modified descriptions (the passed tool retains its original descriptions). +## Parallel Tool Calls + +Some model APIs including OpenAI and Gemini support executing multiple tool calls in parallel. While this can provide a performance improvement, it might not be compatible with semantics of some tools (for example, if they manage some global state between calls). + +You can opt-out of parallel tool calling by adding `parallel=False` to the `@tool` decorator. For example, the built in web browsing tools do this as follows: + +```python +@tool(parallel=False) +def web_browser_go() -> Tool: + ... +``` + ## Bash and Python {#sec-bash-and-python} The `bash()` and `python()` tools enable execution of arbitrary shell commands and Python code, respectively. These tools require the use of a [Sandbox Environment](#sec-sandbox-environments) for the execution of untrusted code. For example, here is how you might use them in an evaluation where the model is asked to write code in order to solve capture the flag (CTF) challenges: @@ -253,14 +265,14 @@ See the [Agents](#sec-agents) section for more details on how to build evaluatio ## Web Browser {#sec-web-browser} ::: {.callout-note apperance="\"simple:"} -Note that the web browser tool described below is currently only available in the development version of Inspect. You can install the development version with: +Note that the web browser tools described below is currently only available in the development version of Inspect. You can install the development version with: ``` bash pip install git+https://github.com/UKGovernmentBEIS/inspect_ai ``` ::: -The web browser tool provides models with the ability to browse the web using a headless Chromium browser. Navigation, history, and mouse/keyboard interactions are all supported. +The web browser tools provids models with the ability to browse the web using a headless Chromium browser. Navigation, history, and mouse/keyboard interactions are all supported. ### Configuration @@ -290,20 +302,20 @@ Rather than using the `inspect_web_browser` image, you can also just include the ### Task -A task configured to use the web browser tool might look like this: +A task configured to use the web browser tools might look like this: ``` python from inspect_ai import Task, task from inspect_ai.scorer import match from inspect_ai.solver import generate, use_tools -from inspect_ai.tool import bash, web_browser_tools +from inspect_ai.tool import bash, python, web_browser @task def browser_task(): return Task( dataset=read_dataset(), solver=[ - use_tools([bash()] + web_browser_tools()), + use_tools([bash(), python()] + web_browser()), generate(), ], scorer=match(), @@ -311,7 +323,7 @@ def browser_task(): ) ``` -Note that we pass `web_browser_tools()` to `use_tools()`, which provides a list of web browsing tools (e.g. `web_browser_go()`, `web_browser_click()`, etc.). +Note that unlike some other tool functions like `bash()`, the `web_browser()` function returns a list of tools. Therefore, we concatenate it with a list of the other tools we are using in the call to `use_tools()`. ### Browsing @@ -319,18 +331,17 @@ If you review the transcripts of a sample with access to the web browser tool, y | Tool | Description | |------------------------------------|------------------------------------| -| `web_browser_go()` | Navigate the web browser to a URL. | -| `web_browser_click()` | Click an element on the page currently displayed by the web browser. | -| `web_browser_scroll()` | Scroll the web browser up or down by one page. | +| `web_browser_go(url)` | Navigate the web browser to a URL. | +| `web_browser_click(element_id)` | Click an element on the page currently displayed by the web browser. | +| `web_browser_type(element_id)` | Type text into an input on a web browser page. | +| `web_browser_type_submit(element_id, text)` | Type text into a form input on a web browser page and press ENTER to submit the form. | +| `web_browser_scroll(direction)` | Scroll the web browser up or down by one page. | | `web_browser_forward()` | Navigate the web browser forward in the browser history. | | `web_browser_back()` | Navigate the web browser back in the browser history. | | `web_browser_refresh()` | Refresh the current page of the web browser. | -| `web_browser_type()` | Type text into an input on a web browser page. | -| `web_browser_type_submit()` | Type text into a form input on a web browser page and press ENTER to submit the form. | : {tbl-colwidths=\[35,65\]} -If you like, you can enable a subset of these tools rather than calling `web_browser_tools()` to use all of them. The return value of each of these tools is a [web accessibility tree](https://web.dev/articles/the-accessibility-tree) for the page, which provides a clean view of the content, links, and form fields available on the page (you can look at the accessibility tree for any web page using [Chrome Developer Tools](https://developer.chrome.com/blog/full-accessibility-tree)). @@ -356,6 +367,7 @@ COPY *.py ./ CMD ["python3", "/app/web_browser/web_server.py"] ``` +Note that all of the Python files in the [_resources](https://github.com/UKGovernmentBEIS/inspect_ai/blob/main/src/inspect_ai/tool/_tools/_web_browser/_resources/) directory alongside the `Dockerfile` need to be available for copying when building the container. ## Web Search {#sec-web-search} diff --git a/examples/browser/browser.py b/examples/browser/browser.py index e6cd1a894..c90daf16d 100644 --- a/examples/browser/browser.py +++ b/examples/browser/browser.py @@ -3,7 +3,7 @@ from inspect_ai.scorer import includes from inspect_ai.solver import generate from inspect_ai.solver._use_tools import use_tools -from inspect_ai.tool import web_browser_tools +from inspect_ai.tool import web_browser @task @@ -15,7 +15,7 @@ def browser(): ) ], solver=[ - use_tools(web_browser_tools()), + use_tools(web_browser()), generate(), ], scorer=includes(), diff --git a/src/inspect_ai/model/_call_tools.py b/src/inspect_ai/model/_call_tools.py index 8ce01f8ea..b0c4d977f 100644 --- a/src/inspect_ai/model/_call_tools.py +++ b/src/inspect_ai/model/_call_tools.py @@ -23,7 +23,7 @@ from inspect_ai._util.registry import registry_info from inspect_ai._util.text import truncate_string_to_bytes from inspect_ai.tool import Tool, ToolCall, ToolError, ToolInfo -from inspect_ai.tool._tool import TOOL_PROMPT, ToolParsingError +from inspect_ai.tool._tool import TOOL_PARALLEL, TOOL_PROMPT, ToolParsingError from inspect_ai.tool._tool_call import ToolCallError from inspect_ai.tool._tool_info import ( ToolParams, @@ -142,9 +142,15 @@ async def call_tool_task(call: ToolCall) -> tuple[ChatMessageTool, ToolEvent]: error=tool_error, ), event - # call tools in parallel - tasks = [call_tool_task(call) for call in message.tool_calls] - results = await asyncio.gather(*tasks) + # call tools in parallel if compatible + results: list[tuple[ChatMessageTool, ToolEvent]] = [] + if all([tool_def.parallel for tool_def in tdefs]): + tasks = [call_tool_task(call) for call in message.tool_calls] + results = await asyncio.gather(*tasks) + # otherwise call serially + else: + for call in message.tool_calls: + results.append(await call_tool_task(call)) # fire tool events for each result for event in [result[1] for result in results]: @@ -168,6 +174,9 @@ class ToolDef: parameters: ToolParams """Tool parameters""" + parallel: bool + """Supports parallel execution.""" + tool: Callable[..., Any] """Callable to execute tool.""" @@ -221,7 +230,7 @@ def tool_defs(tools: list[Tool]) -> list[ToolDef]: def tool_def(tool: Tool) -> ToolDef: # get tool_info - name, prompt = tool_name_and_prompt(tool) + name, prompt, parallel = tool_registry_info(tool) tool_info = parse_tool_info(tool) # if there is a description then append any prompt to the @@ -269,15 +278,17 @@ def raise_not_provided_error(context: str) -> None: name=name, description=tool_info.description, parameters=tool_info.parameters, + parallel=parallel, tool=tool, ) -def tool_name_and_prompt(tool: Tool) -> tuple[str, str | None]: - tool_registry_info = registry_info(tool) - name = tool_registry_info.name.split("/")[-1] - prompt = tool_registry_info.metadata.get(TOOL_PROMPT, None) - return name, prompt +def tool_registry_info(tool: Tool) -> tuple[str, str | None, bool]: + info = registry_info(tool) + name = info.name.split("/")[-1] + prompt = info.metadata.get(TOOL_PROMPT, None) + parallel = info.metadata.get(TOOL_PARALLEL, True) + return name, prompt, parallel def tool_params(input: dict[str, Any], func: Callable[..., Any]) -> dict[str, Any]: diff --git a/src/inspect_ai/tool/__init__.py b/src/inspect_ai/tool/__init__.py index 1022f8ab0..1b0fa874c 100644 --- a/src/inspect_ai/tool/__init__.py +++ b/src/inspect_ai/tool/__init__.py @@ -7,31 +7,13 @@ from ._tool_info import ToolInfo, ToolParam, ToolParams from ._tool_with import tool_with from ._tools._execute import bash, python -from ._tools._web_browser import ( - web_browser_back, - web_browser_click, - web_browser_forward, - web_browser_go, - web_browser_refresh, - web_browser_scroll, - web_browser_tools, - web_browser_type, - web_browser_type_submit, -) +from ._tools._web_browser import web_browser from ._tools._web_search import web_search __all__ = [ "bash", "python", - "web_browser_tools", - "web_browser_go", - "web_browser_click", - "web_browser_scroll", - "web_browser_forward", - "web_browser_back", - "web_browser_refresh", - "web_browser_type", - "web_browser_type_submit", + "web_browser", "web_search", "tool", "tool_with", @@ -53,6 +35,7 @@ _UTIL_MODULE_VERSION = "0.3.19" _REMOVED_IN = "0.4" + relocated_module_attribute( "ToolEnvironment", "inspect_ai.util.SandboxEnvironment", @@ -80,3 +63,9 @@ relocated_module_attribute( "toolenv", "inspect_ai.util.sandboxenv", _UTIL_MODULE_VERSION, _REMOVED_IN ) +relocated_module_attribute( + "web_browser_tools", + "inspect_ai.tool.web_browser", + "0.3.19", + _REMOVED_IN, +) diff --git a/src/inspect_ai/tool/_tool.py b/src/inspect_ai/tool/_tool.py index 575e16012..047c9c2d6 100644 --- a/src/inspect_ai/tool/_tool.py +++ b/src/inspect_ai/tool/_tool.py @@ -94,12 +94,16 @@ def tool() -> Callable[[ToolType], ToolType]: ... @overload def tool( - *, name: str | None = None, prompt: str | None = None + *, name: str | None = None, parallel: bool = True, prompt: str | None = None ) -> Callable[[ToolType], ToolType]: ... def tool( - func: ToolType | None = None, *, name: str | None = None, prompt: str | None = None + func: ToolType | None = None, + *, + name: str | None = None, + parallel: bool = True, + prompt: str | None = None, ) -> ToolType | Callable[[ToolType], ToolType]: r"""Decorator for registering tools. @@ -109,6 +113,9 @@ def tool( Optional name for tool. If the decorator has no name argument then the name of the tool creation function will be used as the name of the tool. + parallel (bool): + Does this tool support parallel execution? + (defaults to True). prompt (str): Deprecated (provide all descriptive information about the tool within the tool function's doc comment) @@ -143,7 +150,7 @@ def tool_wrapper(*args: Any, **kwargs: Any) -> Tool: RegistryInfo( type="tool", name=tool_name, - metadata={TOOL_PROMPT: prompt}, + metadata={TOOL_PROMPT: prompt, TOOL_PARALLEL: parallel}, ), *args, **kwargs, @@ -160,3 +167,4 @@ def tool_wrapper(*args: Any, **kwargs: Any) -> Tool: TOOL_PROMPT = "prompt" +TOOL_PARALLEL = "parallel" diff --git a/src/inspect_ai/tool/_tools/_web_browser/__init__.py b/src/inspect_ai/tool/_tools/_web_browser/__init__.py index 20b00836a..ebc106e7a 100644 --- a/src/inspect_ai/tool/_tools/_web_browser/__init__.py +++ b/src/inspect_ai/tool/_tools/_web_browser/__init__.py @@ -1,23 +1,3 @@ -from ._web_browser import ( - web_browser_back, - web_browser_click, - web_browser_forward, - web_browser_go, - web_browser_refresh, - web_browser_scroll, - web_browser_tools, - web_browser_type, - web_browser_type_submit, -) +from ._web_browser import web_browser -__all__ = [ - "web_browser_tools", - "web_browser_go", - "web_browser_click", - "web_browser_scroll", - "web_browser_forward", - "web_browser_back", - "web_browser_refresh", - "web_browser_type", - "web_browser_type_submit", -] +__all__ = ["web_browser"] diff --git a/src/inspect_ai/tool/_tools/_web_browser/_web_browser.py b/src/inspect_ai/tool/_tools/_web_browser/_web_browser.py index 8df303323..dc0c62f59 100644 --- a/src/inspect_ai/tool/_tools/_web_browser/_web_browser.py +++ b/src/inspect_ai/tool/_tools/_web_browser/_web_browser.py @@ -7,7 +7,7 @@ from inspect_ai.util._sandbox.docker.internal import INSPECT_WEB_BROWSER_IMAGE -def web_browser_tools() -> list[Tool]: +def web_browser() -> list[Tool]: """Tools used for web browser navigation. Returns: @@ -26,7 +26,7 @@ def web_browser_tools() -> list[Tool]: ] -@tool +@tool(parallel=False) def web_browser_go() -> Tool: """Web Browser tool for navigation to a URL. @@ -63,7 +63,7 @@ async def execute(url: str) -> str: return execute -@tool +@tool(parallel=False) def web_browser_click() -> Tool: """Web Browser tool for clicking an element on a web page. @@ -98,7 +98,7 @@ async def execute(element_id: int) -> str: return execute -@tool +@tool(parallel=False) def web_browser_type_submit() -> Tool: """Web Browser tool for typing and submitting input. @@ -136,7 +136,7 @@ async def execute(element_id: int, text: str) -> str: return execute -@tool +@tool(parallel=False) def web_browser_type() -> Tool: """Web Browser tool for typing into inputs. @@ -174,7 +174,7 @@ async def execute(element_id: int, text: str) -> str: return execute -@tool +@tool(parallel=False) def web_browser_scroll() -> Tool: """Web Browser tool for scrolling up or down one page. @@ -204,7 +204,7 @@ async def execute(direction: str) -> str: return execute -@tool +@tool(parallel=False) def web_browser_back() -> Tool: """Web Browser tool for navigating back in the browser history. @@ -225,7 +225,7 @@ async def execute() -> str: return execute -@tool +@tool(parallel=False) def web_browser_forward() -> Tool: """Web Browser tool for navigating forward in the browser history. @@ -246,7 +246,7 @@ async def execute() -> str: return execute -@tool +@tool(parallel=False) def web_browser_refresh() -> Tool: """Web Browser tool for refreshing the current page. diff --git a/tests/tools/test_web_browser.py b/tests/tools/test_web_browser.py index 77cd92c1e..13dc42b3e 100644 --- a/tests/tools/test_web_browser.py +++ b/tests/tools/test_web_browser.py @@ -9,7 +9,7 @@ from inspect_ai.dataset import Sample from inspect_ai.model import ModelOutput, get_model from inspect_ai.solver import generate, use_tools -from inspect_ai.tool import web_browser_tools +from inspect_ai.tool import web_browser from inspect_ai.util import SandboxEnvironmentSpec @@ -18,8 +18,8 @@ def test_web_browser_navigation(): task = Task( dataset=[Sample(input="Please use the web_browser tool")], - solver=[use_tools(web_browser_tools()), generate()], - sandbox=test_sandbox(), + solver=[use_tools(web_browser()), generate()], + sandbox=web_browser_sandbox(), ) log = eval( @@ -113,8 +113,8 @@ def test_web_browser_click(): input="Please use the web browser tool to navigate to https://inspect.ai-safety-institute.org.uk/. Then, once there, use the web_browser_click tool to click the link to the documentation on Solvers." ) ], - solver=[use_tools(web_browser_tools()), generate()], - sandbox=test_sandbox(), + solver=[use_tools(web_browser()), generate()], + sandbox=web_browser_sandbox(), ) log = eval(task, model="openai/gpt-4o")[0] @@ -136,8 +136,8 @@ def test_web_browser_input(): input="Please use the web browser tool to navigate to https://inspect.ai-safety-institute.org.uk/. Then, once there, use the page's search interface to search for 'solvers'" ) ], - solver=[use_tools(web_browser_tools()), generate()], - sandbox=test_sandbox(), + solver=[use_tools(web_browser()), generate()], + sandbox=web_browser_sandbox(), ) log = eval(task, model="openai/gpt-4o")[0] @@ -146,7 +146,7 @@ def test_web_browser_input(): assert type_call -def test_sandbox() -> SandboxEnvironmentSpec: +def web_browser_sandbox() -> SandboxEnvironmentSpec: return ( "docker", (Path(__file__).parent / "test_web_browser_compose.yaml").as_posix(),