Skip to content

Commit

Permalink
Update tutorial.qmd (#1040)
Browse files Browse the repository at this point in the history
Co-authored-by: jjallaire <[email protected]>
  • Loading branch information
pravsels and jjallaire authored Dec 26, 2024
1 parent 78a4e59 commit cc6f6da
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/tutorial.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ inspect eval security_guide.py

[HellaSwag](https://rowanzellers.com/hellaswag/) is a dataset designed to test commonsense natural language inference (NLI) about physical situations. It includes samples that are adversarially constructed to violate common sense about the physical world, so can be a challenge for some language models.

For example, here is one of the questions in the dataset along with its set of possible answer (the correct answer is C):
For example, here is one of the questions in the dataset along with its set of possible answers (the correct answer is C):

> In home pet groomers demonstrate how to groom a pet. the person
>
Expand Down Expand Up @@ -570,4 +570,4 @@ def ctf_agent(max_attempts=3, message_limit=30):

The `basic_agent()` provides a ReAct tool loop with support for retries and encouraging the model to continue if its gives up or gets stuck. The `bash()` and `python()` tools are provided to the model with a 3-minute timeout to prevent long running commands from getting the evaluation stuck.

See the [full source code](https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/gdm_capabilities/intercode_ctf) of the Intercode CTF example to explore the dataset and evaluation code in more depth.
See the [full source code](https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/gdm_capabilities/intercode_ctf) of the Intercode CTF example to explore the dataset and evaluation code in more depth.

0 comments on commit cc6f6da

Please sign in to comment.