diff --git a/docs/tutorial.qmd b/docs/tutorial.qmd index 2922b0125..48995546d 100644 --- a/docs/tutorial.qmd +++ b/docs/tutorial.qmd @@ -78,7 +78,7 @@ inspect eval security_guide.py [HellaSwag](https://rowanzellers.com/hellaswag/) is a dataset designed to test commonsense natural language inference (NLI) about physical situations. It includes samples that are adversarially constructed to violate common sense about the physical world, so can be a challenge for some language models. -For example, here is one of the questions in the dataset along with its set of possible answer (the correct answer is C): +For example, here is one of the questions in the dataset along with its set of possible answers (the correct answer is C): > In home pet groomers demonstrate how to groom a pet. the person > @@ -570,4 +570,4 @@ def ctf_agent(max_attempts=3, message_limit=30): The `basic_agent()` provides a ReAct tool loop with support for retries and encouraging the model to continue if its gives up or gets stuck. The `bash()` and `python()` tools are provided to the model with a 3-minute timeout to prevent long running commands from getting the evaluation stuck. -See the [full source code](https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/gdm_capabilities/intercode_ctf) of the Intercode CTF example to explore the dataset and evaluation code in more depth. \ No newline at end of file +See the [full source code](https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/gdm_capabilities/intercode_ctf) of the Intercode CTF example to explore the dataset and evaluation code in more depth.