diff --git a/.nojekyll b/.nojekyll index 01c3eadec..5c0a22dfc 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -1dd59c30 \ No newline at end of file +c819d68a \ No newline at end of file diff --git a/agents.html b/agents.html index 4ca25ae3a..a5f460e74 100644 --- a/agents.html +++ b/agents.html @@ -506,7 +506,7 @@
incorrect_message
str
continue_message
In the section below we’ll talk more about how to deal with logs from failed evaluations (e.g. retrying the eval).
+The location
property described below is supported only in the development version of Inspect. To install the development version from GitHub:
pip install git+https://github.com/UKGovernmentBEIS/inspect_ai
The EvalLog
object returned from eval()
and read_eval_log()
has a location
property that indicates the storage location it was written to or read from.
The write_eval_log()
function will use this location
if it isn’t passed an explicit location
to write to. This enables you to modify the contents of a log file return from eval()
as follows:
= eval(my_task())[0]
+ log # edit EvalLog as required
+ write_eval_log(log)
Or alternatively for an EvalLog
read from a filesystem:
= read_eval_log(log_file_path)
+ log # edit EvalLog as required
+ write_eval_log(log)
If you are working with the results of an Eval Set, the returned logs are headers rather than the full log with all samples. If you want to edit logs returned from eval_set
you should read them fully, edit them, and then write them. For example:
= eval_set(tasks)
+ success, logs
+ for log in logs:
+= read_eval_log(log.location)
+ log # edit EvalLog as required
+ write_eval_log(log)
Note that the EvalLog.location
is a URI rather than a traditional file path(e.g. it could be a file://
URI, an s3://
URI or any other URI supported by fsspec).
You can enumerate, read, and write EvalLog
objects using the following helper functions from the inspect_ai.log
module:
A common workflow is to define an INSPECT_LOG_DIR
for running a set of evaluations, then calling list_eval_logs()
to analyse the results when all the work is done:
# setup log dir context
-"INSPECT_LOG_DIR"] = "./experiment-logs"
- os.environ[
-# do a bunch of evals
-eval(popularity, model="openai/gpt-4")
-eval(security_guide, model="openai/gpt-4")
-
-# analyze the results in the logs
-= list_eval_logs() logs
# setup log dir context
+"INSPECT_LOG_DIR"] = "./experiment-logs"
+ os.environ[
+# do a bunch of evals
+eval(popularity, model="openai/gpt-4")
+eval(security_guide, model="openai/gpt-4")
+
+# analyze the results in the logs
+= list_eval_logs() logs
Note that list_eval_logs()
lists log files recursively. Pass recursive=False
to list only the log files at the root level.
Use the .eval
log file format which supports compression and incremental access to samples (see details on this in the Log Format section above). If you have existing .json
files you can easily batch convert them to .eval
using the Log Commands described below.
If you only need access to the “header” of the log file (which includes general eval metadata as well as the evaluation results) use the header_only
option of read_eval_log()
:
= read_eval_log(log_file, header_only = True) log
= read_eval_log(log_file, header_only = True) log
If you want to read individual samples, either read them selectively using read_eval_log_sample()
, or read them iteratively using read_eval_log_samples()
(which will ensure that only one sample at a time is read into memory):
# read a single sample
-= read_eval_log_sample(log_file, id = 42)
- sample
-# read all samples using a generator
-for sample in read_eval_log_samples(log_file):
- ...
# read a single sample
+= read_eval_log_sample(log_file, id = 42)
+ sample
+# read all samples using a generator
+for sample in read_eval_log_samples(log_file):
+ ...
Note that read_eval_log_samples()
will raise an error if you pass it a log that does not have status=="success"
(this is because it can’t read all of the samples in an incomplete log). If you want to read the samples anyway, pass the all_samples_required=False
option:
# will not raise an error if the log file has an "error" or "cancelled" status
-for sample in read_eval_log_samples(log_file, all_samples_required=False):
- ...
# will not raise an error if the log file has an "error" or "cancelled" status
+for sample in read_eval_log_samples(log_file, all_samples_required=False):
+ ...
Sample logs often include large pieces of content (e.g. images) that are duplicated in multiple places in the log file (input, message history, events, etc.). To keep the size of log files manageable, images and other large blocks of content are de-duplicated and stored as attachments.
When reading log files, you may want to resolve the attachments so you can get access to the underlying content. You can do this for an EvalSample
using the resolve_sample_attachments()
function:
from inspect_ai.log import resolve_sample_attachments
-
-= resolve_sample_attachments(sample) sample
from inspect_ai.log import resolve_sample_attachments
+
+= resolve_sample_attachments(sample) sample
Note that the read_eval_log()
and read_eval_log_sample()
functions also take a resolve_attachments
option if you want to resolve at the time of reading.
Note you will most typically not want to resolve attachments. The two cases that require attachment resolution for an EvalSample
are:
When an evaluation task fails due to an error or is otherwise interrupted (e.g. by a Ctrl+C), an evaluation log is still written. In many cases errors are transient (e.g. due to network connectivity or a rate limit) and can be subsequently retried.
For these cases, Inspect includes an eval-retry
command and eval_retry()
function that you can use to resume tasks interrupted by errors (including preserving samples already completed within the original task). For example, if you had a failing task with log file logs/2024-05-29T12-38-43_math_Gprr29Mv.json
, you could retry it from the shell with:
$ inspect eval-retry logs/2024-05-29T12-38-43_math_43_math_Gprr29Mv.json
$ inspect eval-retry logs/2024-05-29T12-38-43_math_43_math_Gprr29Mv.json
Or from Python with:
-"logs/2024-05-29T12-38-43_math_43_math_Gprr29Mv.json") eval_retry(
"logs/2024-05-29T12-38-43_math_43_math_Gprr29Mv.json") eval_retry(
Note that retry only works for tasks that are created from @task
decorated functions (as if a Task
is created dynamically outside of an @task
function Inspect does not know how to reconstruct it for the retry).
Note also that eval_retry()
does not overwrite the previous log file, but rather creates a new one (preserving the task_id
from the original file).
Here’s an example of retrying a failed eval with a lower number of max_connections
(the theory being that too many concurrent connections may have caused a rate limit error):
= eval(my_task)[0]
- log if log.status != "success":
-= 3) eval_retry(log, max_connections
= eval(my_task)[0]
+ log if log.status != "success":
+= 3) eval_retry(log, max_connections
When retrying a log file, Inspect will attempt to re-use completed samples from the original task. This can result in substantial time and cost savings compared to starting over from the beginning.
@@ -714,13 +746,13 @@You can use the inspect log list
command to enumerate all of the logs for a given log directory. This command will utilise the INSPECT_LOG_DIR
if it is set (alternatively you can specify a --log-dir
directly). You’ll likely also want to use the --json
flag to get more granular and structured information on the log files. For example:
$ inspect log list --json # uses INSPECT_LOG_DIR
-$ inspect log list --json --log-dir ./security_04-07-2024
$ inspect log list --json # uses INSPECT_LOG_DIR
+$ inspect log list --json --log-dir ./security_04-07-2024
You can also use the --status
option to list only logs with a success
or error
status:
$ inspect log list --json --status success
-$ inspect log list --json --status error
$ inspect log list --json --status success
+$ inspect log list --json --status error
You can use the --retryable
option to list only logs that are retryable
$ inspect log list --json --retryable
$ inspect log list --json --retryable
For example, here we read a local log file and a log file on Amazon S3:
-$ inspect log dump file:///home/user/log/logfile.json
-$ inspect log dump s3://my-evals-bucket/logfile.json
$ inspect log dump file:///home/user/log/logfile.json
+$ inspect log dump s3://my-evals-bucket/logfile.json
You can convert between the two underlying log formats using the inspect log convert
command. The convert command takes a source path (with either a log file or a directory of log files) along with two required arguments that specify the conversion (--to
and --output-dir
). For example:
$ inspect log convert source.json --to eval --output-dir log-output
$ inspect log convert source.json --to eval --output-dir log-output
Or for an entire directory:
-$ inspect log convert logs --to eval --output-dir logs-eval
$ inspect log convert logs --to eval --output-dir logs-eval
Logs that are already in the target format are simply copied to the output directory. By default, log files in the target directory will not be overwritten, however you can add the --overwrite
flag to force an overwrite.
Note that the output directory is always required to enforce the practice of not doing conversions that result in side-by-side log files that are identical save for their format.
Log files are stored in JSON. You can get the JSON schema for the log file format with a call to inspect log schema
:
$ inspect log schema
$ inspect log schema