Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
aisi-inspect committed Oct 2, 2024
1 parent 280ae13 commit a630b5c
Show file tree
Hide file tree
Showing 12 changed files with 70 additions and 70 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
121c73c1
e5d55f6d
2 changes: 1 addition & 1 deletion agents.html
Original file line number Diff line number Diff line change
Expand Up @@ -428,7 +428,7 @@ <h3 class="anchored" data-anchor-id="example">Example</h3>
<span data-code-cell="annotated-cell-1" data-code-lines="27" data-code-annotation="5">Specify that Docker should be used as the sandbox environment.</span>
</dd>
</dl>
<p>The full source code for this example can be found in the Inspect GitHub repository at <a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/gdm_capabilities/intercode_ctf">intercode_ctf</a>.</p>
<p>The full source code for this example can be found in the Inspect GitHub repository at <a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/gdm_capabilities/intercode_ctf">intercode_ctf</a>.</p>
</section>
<section id="options" class="level3">
<h3 class="anchored" data-anchor-id="options">Options</h3>
Expand Down
2 changes: 1 addition & 1 deletion eval-logs.html
Original file line number Diff line number Diff line change
Expand Up @@ -1102,7 +1102,7 @@ <h3 class="anchored" data-anchor-id="reading-logs">Reading Logs</h3>
</div>
</div>
</footer>
<script>var lightboxQuarto = GLightbox({"openEffect":"zoom","selector":".lightbox","loop":false,"descPosition":"bottom","closeEffect":"zoom"});
<script>var lightboxQuarto = GLightbox({"descPosition":"bottom","selector":".lightbox","loop":false,"openEffect":"zoom","closeEffect":"zoom"});
(function() {
let previousOnload = window.onload;
window.onload = () => {
Expand Down
54 changes: 27 additions & 27 deletions examples/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -408,7 +408,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/humaneval">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/humaneval">
HumanEval: Evaluating Large Language Models Trained on Code
</a>
</div>
Expand All @@ -429,7 +429,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/mbpp">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/mbpp">
MBPP: Mostly Basic Python Problems
</a>
</div>
Expand All @@ -450,7 +450,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/swe_bench">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/swe_bench">
SWE-Bench: Resolving Real-World GitHub Issues
</a>
</div>
Expand All @@ -475,7 +475,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/gaia">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/gaia">
GAIA: A Benchmark for General AI Assistants
</a>
</div>
Expand All @@ -499,7 +499,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/gdm_capabilities/intercode_ctf">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/gdm_capabilities/intercode_ctf">
InterCode: Capture the Flag
</a>
</div>
Expand All @@ -520,7 +520,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/gdm_capabilities/in_house_ctf">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/gdm_capabilities/in_house_ctf">
GDM Dangerous Capabilities: Capture the Flag
</a>
</div>
Expand All @@ -544,7 +544,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/mathematics">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/mathematics">
MATH: Measuring Mathematical Problem Solving
</a>
</div>
Expand All @@ -565,7 +565,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/gsm8k">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/gsm8k">
GSM8K: Training Verifiers to Solve Math Word Problems
</a>
</div>
Expand All @@ -586,7 +586,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/mathvista">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/mathvista">
MathVista: Evaluating Mathematical Reasoning in Visual Contexts
</a>
</div>
Expand All @@ -610,7 +610,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/arc">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/arc">
ARC: AI2 Reasoning Challenge
</a>
</div>
Expand All @@ -630,7 +630,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/evals/hellaswag">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/evals/hellaswag">
HellaSwag: Can a Machine Really Finish Your Sentence?
</a>
</div>
Expand All @@ -651,7 +651,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/piqa">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/piqa">
PIQA: Reasoning about Physical Commonsense in Natural Language
</a>
</div>
Expand All @@ -672,7 +672,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/boolq">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/boolq">
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
</a>
</div>
Expand All @@ -693,7 +693,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/drop">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/drop">
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
</a>
</div>
Expand All @@ -714,7 +714,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/winogrande">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/winogrande">
WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale
</a>
</div>
Expand All @@ -735,7 +735,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/race_h">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/race_h">
RACE-H: A benchmark for testing reading comprehension and reasoning abilities of neural models
</a>
</div>
Expand All @@ -756,7 +756,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/mmmu">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/mmmu">
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark
</a>
</div>
Expand All @@ -777,7 +777,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/squad">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/squad">
SQuAD: A Reading Comprehension Benchmark requiring reasoning over Wikipedia articles
</a>
</div>
Expand All @@ -798,7 +798,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/ifeval">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/ifeval">
IFEval: Instruction-Following Evaluation for Large Language Models
</a>
</div>
Expand All @@ -819,7 +819,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/evals/agieval">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/evals/agieval">
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models
</a>
</div>
Expand All @@ -843,7 +843,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/evals/mmlu">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/mmlu">
MMLU: Measuring Massive Multitask Language Understanding
</a>
</div>
Expand All @@ -864,7 +864,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/evals/mmlu_pro">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/mmlu_pro">
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
</a>
</div>
Expand All @@ -885,7 +885,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/gpqa">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/gpqa">
GPQA: A Graduate-Level Google-Proof Q&amp;A Benchmark
</a>
</div>
Expand All @@ -906,7 +906,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/commonsense_qa">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/commonsense_qa">
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
</a>
</div>
Expand All @@ -927,7 +927,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/truthfulqa">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/truthfulqa">
TruthfulQA: Measuring How Models Mimic Human Falsehoods
</a>
</div>
Expand All @@ -948,7 +948,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/xstest">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/xstest">
XSTest: A benchmark for identifying exaggerated safety behaviours in LLM's
</a>
</div>
Expand All @@ -969,7 +969,7 @@ <h1 class="title"><span id="sec-examples" class="quarto-section-identifier"><spa
</div>
<div class="example-info">
<div class="listing-title">
<a href="https://github.com/UKGovernmentBEIS/inspect_ai/tree/main/src/inspect_evals/pubmedqa">
<a href="https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/pubmedqa">
PubMedQA: A Dataset for Biomedical Research Question Answering
</a>
</div>
Expand Down
Binary file added images/toolenv-no-cleanup.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1114,7 +1114,7 @@ <h2 class="anchored" data-anchor-id="learning-more">Learning More</h2>
</div>
</div>
</footer>
<script>var lightboxQuarto = GLightbox({"closeEffect":"zoom","selector":".lightbox","openEffect":"zoom","loop":false,"descPosition":"bottom"});
<script>var lightboxQuarto = GLightbox({"loop":false,"selector":".lightbox","closeEffect":"zoom","openEffect":"zoom","descPosition":"bottom"});
(function() {
let previousOnload = window.onload;
window.onload = () => {
Expand Down
2 changes: 1 addition & 1 deletion log-viewer.html
Original file line number Diff line number Diff line change
Expand Up @@ -1074,7 +1074,7 @@ <h3 class="unlisted anchored" data-anchor-id="other-notes">Other Notes</h3>
</div>
</div>
</footer>
<script>var lightboxQuarto = GLightbox({"openEffect":"zoom","closeEffect":"zoom","selector":".lightbox","loop":false,"descPosition":"bottom"});
<script>var lightboxQuarto = GLightbox({"selector":".lightbox","closeEffect":"zoom","openEffect":"zoom","loop":false,"descPosition":"bottom"});
(function() {
let previousOnload = window.onload;
window.onload = () => {
Expand Down
2 changes: 1 addition & 1 deletion site_libs/bootstrap/bootstrap.min.css

Large diffs are not rendered by default.

40 changes: 20 additions & 20 deletions sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,82 +2,82 @@
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://inspect.ai-safety-institute.org.uk/index.html</loc>
<lastmod>2024-10-01T21:15:39.668Z</lastmod>
<lastmod>2024-10-01T19:48:21.698Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/tutorial.html</loc>
<lastmod>2024-10-01T21:19:55.497Z</lastmod>
<lastmod>2024-10-02T13:33:25.803Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/workflow.html</loc>
<lastmod>2024-09-25T21:06:59.146Z</lastmod>
<lastmod>2024-09-24T13:23:06.338Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/examples/index.html</loc>
<lastmod>2024-09-25T21:06:59.142Z</lastmod>
<lastmod>2024-09-24T13:23:06.334Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/log-viewer.html</loc>
<lastmod>2024-09-25T21:06:59.143Z</lastmod>
<lastmod>2024-09-24T13:23:06.334Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/vscode.html</loc>
<lastmod>2024-09-06T20:10:53.799Z</lastmod>
<lastmod>2024-08-23T18:45:01.497Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/solvers.html</loc>
<lastmod>2024-09-26T21:28:13.316Z</lastmod>
<lastmod>2024-09-27T06:53:10.166Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/tools.html</loc>
<lastmod>2024-10-01T21:15:39.668Z</lastmod>
<lastmod>2024-10-01T14:48:34.664Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/agents.html</loc>
<lastmod>2024-10-01T21:19:55.499Z</lastmod>
<lastmod>2024-10-02T13:33:25.803Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/scorers.html</loc>
<lastmod>2024-10-01T21:15:39.668Z</lastmod>
<lastmod>2024-10-01T13:01:44.079Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/datasets.html</loc>
<lastmod>2024-09-25T21:06:59.141Z</lastmod>
<lastmod>2024-09-27T06:53:10.166Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/models.html</loc>
<lastmod>2024-09-25T21:06:59.143Z</lastmod>
<lastmod>2024-09-24T13:23:06.334Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/eval-sets.html</loc>
<lastmod>2024-09-21T09:59:32.064Z</lastmod>
<lastmod>2024-09-21T11:14:33.579Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/errors-and-limits.html</loc>
<lastmod>2024-09-25T21:06:59.142Z</lastmod>
<lastmod>2024-09-24T13:23:06.334Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/caching.html</loc>
<lastmod>2024-09-25T21:06:59.141Z</lastmod>
<lastmod>2024-09-24T13:23:06.334Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/parallelism.html</loc>
<lastmod>2024-09-25T21:06:59.144Z</lastmod>
<lastmod>2024-09-24T13:23:06.334Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/agents-api.html</loc>
<lastmod>2024-09-25T21:06:59.140Z</lastmod>
<lastmod>2024-09-27T10:16:26.539Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/interactivity.html</loc>
<lastmod>2024-09-13T00:12:39.701Z</lastmod>
<lastmod>2024-09-13T10:57:23.570Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/eval-logs.html</loc>
<lastmod>2024-09-06T20:10:53.779Z</lastmod>
<lastmod>2024-09-03T10:48:54.292Z</lastmod>
</url>
<url>
<loc>https://inspect.ai-safety-institute.org.uk/extensions.html</loc>
<lastmod>2024-09-16T00:25:31.239Z</lastmod>
<lastmod>2024-09-16T10:35:49.684Z</lastmod>
</url>
</urlset>
Loading

0 comments on commit a630b5c

Please sign in to comment.