Zeno + OpenAI Evals

OpenAI's Evals library is a great resource providing evaluation sets for LLMS.

This repo provides a hub for exploring these results using the Zeno evaluation tool.

Add New Evals

To add new evals, add a new entry to evals/evals.yaml with the following fields:

results-file: The first .jsonl result from oaievals
link: A link to the evals commit for this evaluation
description: A succint description of what the evaluation is testing
second-results-file: An optional second .jsonl result from oaievals. Must be the same dataset as the first one.
functions-file: An optional Python file with Zeno functions for the evaluations.

Make sure you test your evals locally before submitting a PR!

poetry install

python -m zeno-evals-hub evals/evals.yaml

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.github		.github
.vscode		.vscode
evals		evals
frontend		frontend
zeno-evals-hub		zeno-evals-hub
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml