Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
aisi-inspect committed Jun 24, 2024
1 parent 4b06132 commit 033fb31
Show file tree
Hide file tree
Showing 14 changed files with 360 additions and 238 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
87e392db
66b71204
178 changes: 111 additions & 67 deletions agents.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion eval-logs.html
Original file line number Diff line number Diff line change
Expand Up @@ -1081,7 +1081,7 @@ <h3 class="anchored" data-anchor-id="reading-logs">Reading Logs</h3>
</div>
</div>
</footer>
<script>var lightboxQuarto = GLightbox({"selector":".lightbox","closeEffect":"zoom","descPosition":"bottom","openEffect":"zoom","loop":false});
<script>var lightboxQuarto = GLightbox({"descPosition":"bottom","selector":".lightbox","openEffect":"zoom","loop":false,"closeEffect":"zoom"});
window.onload = () => {
lightboxQuarto.on('slide_before_load', (data) => {
const { slideIndex, slideNode, slideConfig, player, trigger } = data;
Expand Down
11 changes: 9 additions & 2 deletions eval-tuning.html
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,7 @@ <h2 id="toc-title">Table of contents</h2>

<ul>
<li><a href="#overview" id="toc-overview" class="nav-link active" data-scroll-target="#overview">Overview</a></li>
<li><a href="#max-samples" id="toc-max-samples" class="nav-link" data-scroll-target="#max-samples">Max Samples</a></li>
<li><a href="#model-apis" id="toc-model-apis" class="nav-link" data-scroll-target="#model-apis">Model APIs</a>
<ul class="collapse">
<li><a href="#max-connections" id="toc-max-connections" class="nav-link" data-scroll-target="#max-connections">Max Connections</a></li>
Expand Down Expand Up @@ -324,9 +325,15 @@ <h1 class="title"><span id="sec-eval-tuning" class="quarto-section-identifier"><

<section id="overview" class="level2">
<h2 class="anchored" data-anchor-id="overview">Overview</h2>
<p>Inspect runs evaluations using a highly parallel async architecture. Rather than processing a batch at a time, all samples are processed concurrently. This is possible because evaluations generally use relatively little local compute, but rather spend most of their time waiting for model API calls and web requests to complete. Consequently, Inspect eagerly executes as much local computation as it can and at the same time ensures that model APIs are not over-saturated by enforcing a maximum number of concurrent connections.</p>
<p>Inspect runs evaluations using a highly parallel async architecture. Rather than processing a batch at a time, many samples are processed concurrently. This is possible because evaluations generally use relatively little local compute, but rather spend most of their time waiting for model API calls and web requests to complete. Consequently, Inspect eagerly executes as much local computation as it can and at the same time ensures that model APIs are not over-saturated by enforcing a maximum number of concurrent connections.</p>
<p>This section describes how to tune Inspect’s concurrency, as well as how to handle situations where more local compute is required.</p>
</section>
<section id="max-samples" class="level2">
<h2 class="anchored" data-anchor-id="max-samples">Max Samples</h2>
<p>The <code>max_samples</code> option determines how many samples are executed in parallel. By default, <code>max_samples</code> is set to <code>max_connections</code> so that the connection to the Model API can be fully utilised. See the section below for more details on <code>max_connections</code>.</p>
<p>If you have additional expensive operations beyond calling models (e.g.&nbsp;using a <a href="agents.html#sec-tool-environments">Tool Environment</a>) then you may want to increase <code>max_samples</code> to fully saturate both the Model API and container subprocesses used for tool execution. When running an evaluation you’ll see an indicator of how many connections and how many subprocesses are currently active. If neither is at capacity then you will likely benefit from increasing <code>max_samples</code>.</p>
<p>Note that setting <code>max_samples</code> to an arbitrarily high number does have some disadvantages: you will consume more memory (especially if using tool environments) as well as wait longer for completed samples to be logged (so could be subject to losing more work if your eval task fails).</p>
</section>
<section id="model-apis" class="level2">
<h2 class="anchored" data-anchor-id="model-apis">Model APIs</h2>
<section id="max-connections" class="level3">
Expand Down Expand Up @@ -404,7 +411,7 @@ <h2 class="anchored" data-anchor-id="subprocesses">Subprocesses</h2>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> result.success:</span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> result.stdout</span>
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a> <span class="cf">else</span>:</span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="ss">f"Error: </span><span class="sc">{</span>result<span class="sc">.</span>stderr<span class="sc">}</span><span class="ss">"</span></span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a> <span class="cf">raise</span> ToolError(result.stderr)</span>
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> execute</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<p>The maximum number of concurrent subprocesses can be modified using the <code>--max-subprocesses</code> option. For example:</p>
Expand Down
Loading

0 comments on commit 033fb31

Please sign in to comment.