Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
jjallaire committed Dec 3, 2024
1 parent 148bc13 commit 01911fc
Show file tree
Hide file tree
Showing 41 changed files with 374 additions and 1,863 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
c69b92b9
9003719b
2 changes: 1 addition & 1 deletion agents-api.html
Original file line number Diff line number Diff line change
Expand Up @@ -348,7 +348,7 @@ <h2 id="toc-title">On this page</h2>
</ul></li>
<li><a href="#sandboxing" id="toc-sandboxing" class="nav-link" data-scroll-target="#sandboxing">Sandboxing</a></li>
</ul>
<div class="toc-actions"><ul><li><a href="https://github.com/UKGovernmentBEIS/inspect_ai/issues/new" class="toc-action"><i class="bi bi-github"></i>Report an issue</a></li></ul></div><div class="quarto-alternate-formats"><h2>Other Formats</h2><ul><li><a href="agents-api.html.lmd"><i class="bi bi-file-code"></i>CommonMark</a></li></ul></div></nav>
<div class="toc-actions"><ul><li><a href="https://github.com/UKGovernmentBEIS/inspect_ai/issues/new" class="toc-action"><i class="bi bi-github"></i>Report an issue</a></li></ul></div><div class="quarto-alternate-formats"><h2>Other Formats</h2><ul><li><a href="agents-api.html.lmd"><i class="bi bi-file-code"></i>Github (GFM)</a></li></ul></div></nav>
</div>
<!-- main -->
<main class="content" id="quarto-document-content">
Expand Down
47 changes: 8 additions & 39 deletions agents-api.html.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,45 +175,14 @@ if output.stop_reason == "model_length":

Here are the possible values for `StopReason` :

<table>
<colgroup>
<col style="width: 35%" />
<col style="width: 65%" />
</colgroup>
<thead>
<tr class="header">
<th>Stop Reason</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>stop</code></td>
<td>The model hit a natural stop point or a provided stop sequence</td>
</tr>
<tr class="even">
<td><code>max_tokens</code></td>
<td>The maximum number of tokens specified in the request was
reached.</td>
</tr>
<tr class="odd">
<td><code>model_length</code></td>
<td>The model’s context length was exceeded.</td>
</tr>
<tr class="even">
<td><code>tool_calls</code></td>
<td>The model called a tool</td>
</tr>
<tr class="odd">
<td><code>content_filter</code></td>
<td>Content was omitted due to a content filter.</td>
</tr>
<tr class="even">
<td><code>unknown</code></td>
<td>Unknown (e.g. unexpected runtime error)</td>
</tr>
</tbody>
</table>
| Stop Reason | Description |
|----|----|
| `stop` | The model hit a natural stop point or a provided stop sequence |
| `max_tokens` | The maximum number of tokens specified in the request was reached. |
| `model_length` | The model’s context length was exceeded. |
| `tool_calls` | The model called a tool |
| `content_filter` | Content was omitted due to a content filter. |
| `unknown` | Unknown (e.g. unexpected runtime error) |

### Error Handling

Expand Down
2 changes: 1 addition & 1 deletion agents.html
Original file line number Diff line number Diff line change
Expand Up @@ -355,7 +355,7 @@ <h2 id="toc-title">On this page</h2>
<li><a href="#troubleshooting" id="toc-troubleshooting" class="nav-link" data-scroll-target="#troubleshooting">Troubleshooting</a></li>
</ul></li>
</ul>
<div class="toc-actions"><ul><li><a href="https://github.com/UKGovernmentBEIS/inspect_ai/issues/new" class="toc-action"><i class="bi bi-github"></i>Report an issue</a></li></ul></div><div class="quarto-alternate-formats"><h2>Other Formats</h2><ul><li><a href="agents.html.lmd"><i class="bi bi-file-code"></i>CommonMark</a></li></ul></div></nav>
<div class="toc-actions"><ul><li><a href="https://github.com/UKGovernmentBEIS/inspect_ai/issues/new" class="toc-action"><i class="bi bi-github"></i>Report an issue</a></li></ul></div><div class="quarto-alternate-formats"><h2>Other Formats</h2><ul><li><a href="agents.html.lmd"><i class="bi bi-file-code"></i>Github (GFM)</a></li></ul></div></nav>
</div>
<!-- main -->
<main class="content" id="quarto-document-content">
Expand Down
205 changes: 31 additions & 174 deletions agents.html.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,76 +129,18 @@ repository at
There are several options available for customising the behaviour of the
basic agent:

<table style="width:93%;">
<colgroup>
<col style="width: 23%" />
<col style="width: 20%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr class="header">
<th>Option</th>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>init</code></td>
<td><code>Solver | list[Solver]</code></td>
<td>Agent initialisation (e.g. <code>system_message()</code>).</td>
</tr>
<tr class="even">
<td><code>tools</code></td>
<td><code>list[Tool]</code></td>
<td>List of tools available to the agent.</td>
</tr>
<tr class="odd">
<td><code>max_attempts</code></td>
<td><code>int</code></td>
<td>Maximum number of submission attempts to accept.</td>
</tr>
<tr class="even">
<td><code>message_limit</code></td>
<td><code>int</code></td>
<td>Limit on messages in conversation before terminating agent.</td>
</tr>
<tr class="odd">
<td><code>token_limit</code></td>
<td><code>int</code></td>
<td>Limit on in conversation before terminating agent.</td>
</tr>
<tr class="even">
<td><code>score_value</code></td>
<td><code>ValueToFloat</code></td>
<td>Function used to extract values from scores (defaults to standard
<code>value_to_float()</code>).</td>
</tr>
<tr class="odd">
<td><code>incorrect_message</code></td>
<td><code>str</code></td>
<td>User message reply for an incorrect submission from the model.
Alternatively, a function which returns a message.</td>
</tr>
<tr class="even">
<td><code>continue_message</code></td>
<td><code>str</code></td>
<td>User message to urge the model to continue when it doesn’t make a
tool call.</td>
</tr>
<tr class="odd">
<td><code>submit_name</code></td>
<td><code>str</code></td>
<td>Name for tool used to make submissions (defaults to ‘submit’).</td>
</tr>
<tr class="even">
<td><code>submit_description</code></td>
<td><code>str</code></td>
<td>Description of submit tool (defaults to ‘Submit an answer for
evaluation’)</td>
</tr>
</tbody>
</table>
| Option | Type | Description |
|----|----|----|
| `init` | `Solver | list[Solver]` | Agent initialisation (e.g. `system_message()`). |
| `tools` | `list[Tool]` | List of tools available to the agent. |
| `max_attempts` | `int` | Maximum number of submission attempts to accept. |
| `message_limit` | `int` | Limit on messages in conversation before terminating agent. |
| `token_limit` | `int` | Limit on in conversation before terminating agent. |
| `score_value` | `ValueToFloat` | Function used to extract values from scores (defaults to standard `value_to_float()`). |
| `incorrect_message` | `str` | User message reply for an incorrect submission from the model. Alternatively, a function which returns a message. |
| `continue_message` | `str` | User message to urge the model to continue when it doesn’t make a tool call. |
| `submit_name` | `str` | Name for tool used to make submissions (defaults to ‘submit’). |
| `submit_description` | `str` | Description of submit tool (defaults to ‘Submit an answer for evaluation’) |

For multiple attempts, submissions are evaluated using the task’s main
scorer, with value of 1.0 indicating a correct answer. Scorer values are
Expand Down Expand Up @@ -283,45 +225,14 @@ if output.stop_reason == "model_length":

Here are the possible values for `StopReason` :

<table>
<colgroup>
<col style="width: 35%" />
<col style="width: 65%" />
</colgroup>
<thead>
<tr class="header">
<th>Stop Reason</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>stop</code></td>
<td>The model hit a natural stop point or a provided stop sequence</td>
</tr>
<tr class="even">
<td><code>max_tokens</code></td>
<td>The maximum number of tokens specified in the request was
reached.</td>
</tr>
<tr class="odd">
<td><code>model_length</code></td>
<td>The model’s context length was exceeded.</td>
</tr>
<tr class="even">
<td><code>tool_calls</code></td>
<td>The model called a tool</td>
</tr>
<tr class="odd">
<td><code>content_filter</code></td>
<td>Content was omitted due to a content filter.</td>
</tr>
<tr class="even">
<td><code>unknown</code></td>
<td>Unknown (e.g. unexpected runtime error)</td>
</tr>
</tbody>
</table>
| Stop Reason | Description |
|----|----|
| `stop` | The model hit a natural stop point or a provided stop sequence |
| `max_tokens` | The maximum number of tokens specified in the request was reached. |
| `model_length` | The model’s context length was exceeded. |
| `tool_calls` | The model called a tool |
| `content_filter` | Content was omitted due to a content filter. |
| `unknown` | Unknown (e.g. unexpected runtime error) |

### Error Handling

Expand Down Expand Up @@ -458,9 +369,7 @@ class InspectChatModel(BaseChatModel):
...
```

<div>

> **Note**
> [!NOTE]
>
> Note that the the `inspect_langchain` module imported here is not a
> built in feature of Inspect. Rather, you can find its [source
Expand All @@ -469,8 +378,6 @@ class InspectChatModel(BaseChatModel):
> agents or as the basis for creating similar integrations with other
> agent frameworks.
</div>

Now here’s the `wikipedia_search()` solver (imports again excluded for
brevity):

Expand Down Expand Up @@ -722,32 +629,10 @@ The sandbox is also available to custom scorers.

There are two sandbox environments built in to Inspect:

<table>
<colgroup>
<col style="width: 36%" />
<col style="width: 63%" />
</colgroup>
<thead>
<tr class="header">
<th>Environment Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>local</code></td>
<td>Run <code>sandbox()</code> methods in the same file system as the
running evaluation (should <em>only be used</em> if you are already
running your evaluation in another sandbox).</td>
</tr>
<tr class="even">
<td><code>docker</code></td>
<td>Run <code>sandbox()</code> methods within a Docker container (see
the <a href="#sec-docker-configuration">Docker Configuration</a> section
below for additional details).</td>
</tr>
</tbody>
</table>
| Environment Type | Description |
|----|----|
| `local` | Run `sandbox()` methods in the same file system as the running evaluation (should *only be used* if you are already running your evaluation in another sandbox). |
| `docker` | Run `sandbox()` methods within a Docker container (see the [Docker Configuration](#sec-docker-configuration) section below for additional details). |

Sandbox environment definitions can be bound at the `Sample`, `Task`, or
`eval()` level. Binding precedence goes from `eval()`, to `Task` to
Expand Down Expand Up @@ -832,35 +717,11 @@ file (`compose.yaml`).
Here is how Docker sandbox environments are created based on the
presence of `Dockerfile` and/or `compose.yml` in the task directory:

<table>
<colgroup>
<col style="width: 37%" />
<col style="width: 62%" />
</colgroup>
<thead>
<tr class="header">
<th>Config Files</th>
<th>Behavior</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>None</td>
<td>Creates a sandbox environment based on the official <a
href="https://hub.docker.com/_/python">python:3.12-bookworm</a>
image.</td>
</tr>
<tr class="even">
<td><code>Dockerfile</code></td>
<td>Creates a sandbox environment by building the image.</td>
</tr>
<tr class="odd">
<td><code>compose.yaml</code></td>
<td>Creates sandbox environment(s) based on
<code>compose.yaml</code>.</td>
</tr>
</tbody>
</table>
| Config Files | Behavior |
|----|----|
| None | Creates a sandbox environment based on the official [python:3.12-bookworm](https://hub.docker.com/_/python) image. |
| `Dockerfile` | Creates a sandbox environment by building the image. |
| `compose.yaml` | Creates sandbox environment(s) based on `compose.yaml`. |

Providing a `compose.yaml` is not strictly required, as Inspect will
automatically generate one as needed. Note that the automatically
Expand Down Expand Up @@ -989,18 +850,14 @@ sandbox() # default sandbox environment
sandbox("victim") # named sandbox environment
```

<div>

> **Note**
> [!NOTE]
>
> If you define multiple sandbox environments you are *required* to name
> one of them “default” so that Inspect knows which environment to
> resolve for calls to `sandbox()` without an argument. Alternatively,
> you can add the `x-default` key to a service not named “default” to
> designate it as the default sandbox.

</div>

#### Infrastructure

Note that in many cases you’ll want to provision additional
Expand Down
2 changes: 1 addition & 1 deletion approval.html
Original file line number Diff line number Diff line change
Expand Up @@ -336,7 +336,7 @@ <h2 id="toc-title">On this page</h2>
<li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li>
</ul></li>
</ul>
<div class="toc-actions"><ul><li><a href="https://github.com/UKGovernmentBEIS/inspect_ai/issues/new" class="toc-action"><i class="bi bi-github"></i>Report an issue</a></li></ul></div><div class="quarto-alternate-formats"><h2>Other Formats</h2><ul><li><a href="approval.html.lmd"><i class="bi bi-file-code"></i>CommonMark</a></li></ul></div></nav>
<div class="toc-actions"><ul><li><a href="https://github.com/UKGovernmentBEIS/inspect_ai/issues/new" class="toc-action"><i class="bi bi-github"></i>Report an issue</a></li></ul></div><div class="quarto-alternate-formats"><h2>Other Formats</h2><ul><li><a href="approval.html.lmd"><i class="bi bi-file-code"></i>Github (GFM)</a></li></ul></div></nav>
</div>
<!-- main -->
<main class="content" id="quarto-document-content">
Expand Down
45 changes: 7 additions & 38 deletions approval.html.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,44 +116,13 @@ def auto_approver(decision: ApprovalDecision = "approve") -> Approver:

There are five possible approval decisions:

<table>
<colgroup>
<col style="width: 50%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr class="header">
<th>Decision</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>approve</td>
<td>The tool call is approved</td>
</tr>
<tr class="even">
<td>modify</td>
<td>The tool call is approved with modification (included in
<code>modified</code> field of <code>Approver</code>)</td>
</tr>
<tr class="odd">
<td>reject</td>
<td>The tool call is rejected (report to the model that the call was
rejected along with an explanation)</td>
</tr>
<tr class="even">
<td>escalate</td>
<td>The tool call should be escalated to the next approver in the
chain.</td>
</tr>
<tr class="odd">
<td>terminate</td>
<td>The current sample should be terminated as a result of the tool
call.</td>
</tr>
</tbody>
</table>
| Decision | Description |
|----|----|
| approve | The tool call is approved |
| modify | The tool call is approved with modification (included in `modified` field of `Approver`) |
| reject | The tool call is rejected (report to the model that the call was rejected along with an explanation) |
| escalate | The tool call should be escalated to the next approver in the chain. |
| terminate | The current sample should be terminated as a result of the tool call. |

Here’s a more complicated custom approver that implements an allow list
for bash commands. Imagine that we’ve implemented this approver within a
Expand Down
Loading

0 comments on commit 01911fc

Please sign in to comment.