🚀 EvalGen Project

This project allows you to create and run LLM judges based on annotated datasets using Weights & Biases (wandb) and Weave for tracking and tracing.

🛠️ Setup

Create a .env file in the project root with the following variables:

WANDB_EMAIL=your_wandb_email 
WANDB_API_KEY=your_wandb_api_key
OPENAI_API_KEY=your_openai_api_key

Install the required dependencies.

🏃‍♂️ Running the Annotation App

To start the annotation app, run:

python main.py

This will launch a web interface for annotating your dataset.

🧠 Creating an LLM Judge

To programmatically create an LLM judge from your wandb dataset annotations:

Open forge_evaluation_judge.ipynb in a Jupyter environment.
Run all cells in the notebook.

This will generate a judge like the one in forged_judge.

🔍 Running the Generated Judge

To load and run the generated judge:

Open run_forged_judge.ipynb in a Jupyter environment.
Run all cells in the notebook.

This will evaluate your dataset using the forged judge, with results fully tracked and traced using Weave.

📊 Key Components

main.py: Annotation app
forge_evaluation_judge.ipynb: Judge creation notebook
run_forged_judge.ipynb: Judge execution notebook

All components are integrated with Weave for comprehensive tracking and tracing of your machine learning workflow.

Happy evaluating! 🎉

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
evalforge		evalforge
forged_judge		forged_judge
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
api_client.py		api_client.py
forge_evaluation_judge.ipynb		forge_evaluation_judge.ipynb
lint.sh		lint.sh
main.css		main.css
main.js		main.js
main.py		main.py
query_schema.json		query_schema.json
requirements.dev.txt		requirements.dev.txt
requirements.txt		requirements.txt
run_forged_judge.ipynb		run_forged_judge.ipynb
weave_dummy.ipynb		weave_dummy.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 EvalGen Project

🛠️ Setup

🏃‍♂️ Running the Annotation App

🧠 Creating an LLM Judge

🔍 Running the Generated Judge

📊 Key Components

About

Releases

Packages

Contributors 2

Languages

wandb/evalForge

Folders and files

Latest commit

History

Repository files navigation

🚀 EvalGen Project

🛠️ Setup

🏃‍♂️ Running the Annotation App

🧠 Creating an LLM Judge

🔍 Running the Generated Judge

📊 Key Components

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages