From 063b320fd6864656e5f2894f0fec6a94f8c89f2c Mon Sep 17 00:00:00 2001 From: aisi-inspect <166920645+aisi-inspect@users.noreply.github.com> Date: Thu, 3 Oct 2024 14:02:29 +0000 Subject: [PATCH] Built site for gh-pages --- .nojekyll | 2 +- agents-api.html | 3 +- agents.html | 167 ++++++++++++++++++++++++------------------------ eval-logs.html | 2 +- index.html | 2 +- log-viewer.html | 2 +- search.json | 19 ++++-- sitemap.xml | 8 +-- tools.html | 97 +++++++++++++--------------- tutorial.html | 26 ++++---- vscode.html | 2 +- workflow.html | 2 +- 12 files changed, 168 insertions(+), 164 deletions(-) diff --git a/.nojekyll b/.nojekyll index f95ee5a4d..00bc099a0 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -e5d55f6d \ No newline at end of file +fb91b253 \ No newline at end of file diff --git a/agents-api.html b/agents-api.html index b1f3f8cf8..eb1ef82a1 100644 --- a/agents-api.html +++ b/agents-api.html @@ -421,7 +421,7 @@

Custom Loop

def agent_loop(): async def solve(state: TaskState, generate: Generate): model = get_model() - while True: + while not state.completed: # call model output = await model.generate(state.messages, state.tools) @@ -438,6 +438,7 @@

Custom Loop

return state return solve +

The state.completed flag is automatically set to False if max_messages for the task is exceeded, so we check it at the top of the loop.

You can imagine several ways you might want to customise this loop:

  1. Adding another termination condition for the output satisfying some criteria.
  2. diff --git a/agents.html b/agents.html index df268cea4..3d843df44 100644 --- a/agents.html +++ b/agents.html @@ -516,7 +516,7 @@

    Custom Scaffold

    def agent_loop(): async def solve(state: TaskState, generate: Generate): model = get_model() - while True: + while not state.completed: # call model output = await model.generate(state.messages, state.tools) @@ -533,6 +533,7 @@

    Custom Scaffold

    return state return solve +

    The state.completed flag is automatically set to False if max_messages for the task is exceeded, so we check it at the top of the loop.

    You can imagine several ways you might want to customise this loop:

    1. Adding another termination condition for the output satisfying some criteria.
    2. @@ -924,11 +925,7 @@

      Files

      Script

      -

      If there is a Sample setup script it will be executed within the default sandbox environment after any Sample files are copied into the environment. The setup field can be either the script contents, a file path containing the script, or a base64 encoded Data URL.

      -

      The setup script is by default interpreted as a bash script, however you can have it executed by another interpreter using a shebang comment. For example, this will be executed as a Python script:

      -
      #!/usr/bin/env python3
      -
      -print('hello from python')
      +

      If there is a Sample setup bash script it will be executed within the default sandbox environment after any Sample files are copied into the environment. The setup field can be either the script contents, a file path containing the script, or a base64 encoded Data URL.

      @@ -964,14 +961,14 @@

      Docker Configurat
      compose.yaml
      -
      services:
      -  default: 
      -    build: .
      -    init: true
      -    command: tail -f /dev/null
      -    cpus: 1.0
      -    mem_limit: 0.5gb
      -    network_mode: none
      +
      services:
      +  default: 
      +    build: .
      +    init: true
      +    command: tail -f /dev/null
      +    cpus: 1.0
      +    mem_limit: 0.5gb
      +    network_mode: none

      The init: true entry enables the container to respond to shutdown requests. The command is provided to prevent the container from exiting after it starts.

      Here is what a simple compose.yaml would look like for a local pre-built image named ctf-agent-environment (resource and network limits excluded for brevity):

      @@ -979,34 +976,34 @@

      Docker Configurat
      compose.yaml
      -
      services:
      -  default: 
      -    image: ctf-agent-environment
      -    x-local: true
      -    init: true
      -    command: tail -f /dev/null
      +
      services:
      +  default: 
      +    image: ctf-agent-environment
      +    x-local: true
      +    init: true
      +    command: tail -f /dev/null

      The ctf-agent-environment is not an image that exists on a remote registry, so we add the x-local: true to indicate that it should not be pulled. If local images are tagged, they also will not be pulled by default (so x-local: true is not required). For example:

      compose.yaml
      -
      services:
      -  default: 
      -    image: ctf-agent-environment:1.0.0
      -    init: true
      -    command: tail -f /dev/null
      +
      services:
      +  default: 
      +    image: ctf-agent-environment:1.0.0
      +    init: true
      +    command: tail -f /dev/null

      If we are using an image from a remote registry we similarly don’t need to include x-local:

      compose.yaml
      -
      services:
      -  default:
      -    image: python:3.12-bookworm
      -    init: true
      -    command: tail -f /dev/null
      +
      services:
      +  default:
      +    image: python:3.12-bookworm
      +    init: true
      +    command: tail -f /dev/null

      See the Docker Compose documentation for information on all available container options.

      @@ -1016,23 +1013,23 @@

      Multiple Environment
      compose.yaml
      -
      services:
      -  default:
      -    image: ctf-agent-environment
      -    x-local: true
      -    init: true
      -    cpus: 1.0
      -    mem_limit: 0.5gb
      -  victim:
      -    image: ctf-victim-environment
      -    x-local: true
      -    init: true
      -    cpus: 1.0
      -    mem_limit: 1gb
      +
      services:
      +  default:
      +    image: ctf-agent-environment
      +    x-local: true
      +    init: true
      +    cpus: 1.0
      +    mem_limit: 0.5gb
      +  victim:
      +    image: ctf-victim-environment
      +    x-local: true
      +    init: true
      +    cpus: 1.0
      +    mem_limit: 1gb

      The first environment listed is the “default” environment, and can be accessed from within a tool with a normal call to sandbox(). Other environments would be accessed by name, for example:

      -
      sandbox()          # default sandbox environment
      -sandbox("victim")  # named sandbox environment
      +
      sandbox()          # default sandbox environment
      +sandbox("victim")  # named sandbox environment
      @@ -1050,53 +1047,53 @@

      Multiple Environment

      Infrastructure

      Note that in many cases you’ll want to provision additional infrastructure (e.g. other hosts or volumes). For example, here we define an additional container (“writer”) as well as a volume shared between the default container and the writer container:

      -
      services:
      -  default: 
      -    image: ctf-agent-environment
      -    x-local: true
      -    init: true
      -    volumes:
      -      - ctf-challenge-volume:/shared-data
      -    
      -  writer:
      -    image: ctf-challenge-writer
      -    x-local: true
      -    init: true
      -    volumes:
      -      - ctf-challenge-volume:/shared-data
      -volumes:
      -  ctf-challenge-volume:
      +
      services:
      +  default: 
      +    image: ctf-agent-environment
      +    x-local: true
      +    init: true
      +    volumes:
      +      - ctf-challenge-volume:/shared-data
      +    
      +  writer:
      +    image: ctf-challenge-writer
      +    x-local: true
      +    init: true
      +    volumes:
      +      - ctf-challenge-volume:/shared-data
      +volumes:
      +  ctf-challenge-volume:

      See the documentation on Docker Compose files for information on their full schema and feature set.

      Sample Metadata

      You might want to interpolate Sample metadata into your Docker compose files. You can do this using the standard compose environment variable syntax, where any metadata in the Sample is made available with a SAMPLE_METADATA_ prefix. For example, you might have a per-sample memory limit (with a default value of 0.5gb if unspecified):

      -
      services:
      -  default:
      -    image: ctf-agent-environment
      -    x-local: true
      -    init: true
      -    cpus: 1.0
      -    mem_limit: ${SAMPLE_METDATA_MEMORY_LIMIT-0.5gb}
      +
      services:
      +  default:
      +    image: ctf-agent-environment
      +    x-local: true
      +    init: true
      +    cpus: 1.0
      +    mem_limit: ${SAMPLE_METDATA_MEMORY_LIMIT-0.5gb}

      Note that - suffix that provides the default value of 0.5gb. This is important to include so that when the compose file is read without the context of a Sample (for example, when pulling/building images at startup) that a default value is available.

      Environment Cleanup

      When a task is completed, Inspect will automatically cleanup resources associated with the sandbox environment (e.g. containers, images, and networks). If for any reason resources are not cleaned up (e.g. if the cleanup itself is interrupted via Ctrl+C) you can globally cleanup all environments with the inspect sandbox cleanup command. For example, here we cleanup all environments associated with the docker provider:

      -
      $ inspect sandbox cleanup docker
      +
      $ inspect sandbox cleanup docker

      In some cases you may prefer not to cleanup environments. For example, you might want to examine their state interactively from the shell in order to debug an agent. Use the --no-sandbox-cleanup argument to do this:

      -
      $ inspect eval ctf.py --no-sandbox-cleanup
      +
      $ inspect eval ctf.py --no-sandbox-cleanup

      You can also do this when using eval():

      -
      eval("ctf.py", sandbox_cleanup = False)
      +
      eval("ctf.py", sandbox_cleanup = False)

      When you do this, you’ll see a list of sandbox containers printed out which includes the ID of each container. You can then use this ID to get a shell inside one of the containers:

      -
      docker exec -it inspect-intercode_ctf-ipg9tbviycpvlgwja5anyvn-default-1 bash
      +
      docker exec -it inspect-intercode_ctf-ipg9tbviycpvlgwja5anyvn-default-1 bash

      When you no longer need the environments, you can clean them up either all at once or individually:

      -
      # cleanup all environments
      -inspect sandbox cleanup docker
      -
      -# cleanup single environment
      -inspect sandbox cleanup docker inspect-intercode_ctf-ipg9tbviycpvlgwja5anyvn
      +
      # cleanup all environments
      +inspect sandbox cleanup docker
      +
      +# cleanup single environment
      +inspect sandbox cleanup docker inspect-intercode_ctf-ipg9tbviycpvlgwja5anyvn

      Resource Management

      @@ -1110,13 +1107,13 @@

      Running Containers

      compose.yaml
      -
      services:
      -  default: 
      -    image: ctf-agent-environment
      -    x-local: true
      -    command: tail -f /dev/null
      -    cpus: 1.0
      -    mem_limit: 0.5gb
      +
      services:
      +  default: 
      +    image: ctf-agent-environment
      +    x-local: true
      +    command: tail -f /dev/null
      +    cpus: 1.0
      +    mem_limit: 0.5gb
      @@ -1128,7 +1125,7 @@

      Concurrent Execution<

      Troubleshooting

      You can view more detailed logging around the creation and use of sandbox environments by using the sandbox log level. For example:

      -
      $ inspect eval ctf.py --log-level sandbox
      +
      $ inspect eval ctf.py --log-level sandbox

      The sandbox log level is just above warning (so it will not show http or debug level messages).

      diff --git a/eval-logs.html b/eval-logs.html index d5f5f7d44..a1d72bb62 100644 --- a/eval-logs.html +++ b/eval-logs.html @@ -1102,7 +1102,7 @@

      Reading Logs

      -