🤗 Yourbench

Dynamic Evaluation Set Generation for LLM Benchmarking [NAACL '25]

🌟 Overview

Yourbench is a powerful framework for dynamically generating evaluation sets from source documents. It addresses the limitations of static benchmarks and benchmark saturation by creating diverse, contextually-rich questions tailored to specific educational levels.

🔄 Process Flow

✨ Features

🔄 Dynamic Generation: Create evaluation sets on-the-fly from any source documents
📚 Semantic Chunking: Smart document splitting that maintains context and meaning
🤔 Multi-hop Questions: Generate questions that require synthesizing information across document sections
📊 Configurable Difficulty: Tailor questions to specific educational levels
🔍 Diverse Question Types: Support for 10 different question types
🤖 Model Flexibility: Works with OpenAI and Azure OpenAI models via LiteLLM
📦 Hugging Face Integration: Direct dataset publishing to Hugging Face Hub

🛠️ Requirements

Python 3.10+
LiteLLM for model inference
Sentence Transformers for semantic chunking
Hugging Face Datasets for dataset management
OpenAI API Compatible API / Azure AI. (more model types coming soon!)

📦 Installation

# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
.\venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

🚀 Quick Start

Set up your environment:

# For OpenAI / OpenAI compatible APIs
export MODEL_BASE_URL=your_openai_url
export MODEL_API_KEY=your_openai_key

# For Azure OpenAI
export AZURE_BASE_URL=your_azure_url
export AZURE_API_KEY=your_azure_key

Create a task configuration (config.yaml). Here is some more information!. You can also look at an example task configuration
Run the example task (after setting your 🤗 username / organization in the config!):

python src/yourbench/run_task.py --task-name yourbench_y1

📚 Documentation

Detailed documentation is available in the docs directory:

Configuration Guide: Comprehensive guide to YAML configuration
Question Generation: Details about the question generation process
Chunking System: Information about the semantic chunking system

🏗️ Pipeline Components

1. Dataset Generation

Processes source documents
Creates structured datasets
Supports local files and Hugging Face datasets

2. Document Summarization

Generates document summaries
Provides context for question generation
Uses configured language model

3. Semantic Chunking

Splits documents intelligently
Maintains semantic coherence
Configurable chunk sizes and overlap

4. Multi-hop Chunk Creation

Pairs related document chunks
Enables complex reasoning questions
Smart chunk selection

5. Question Generation

Single-shot questions from individual chunks
Multi-hop questions from chunk pairs
10 different question types
Difficulty calibration
Educational level targeting

6. Dataset Management

Hugging Face integration
Local storage options
Dataset versioning

🎯 Question Types

Analytical: Break down complex ideas
Application-based: Apply concepts to scenarios
Clarification: Deep dive into specifics
Counterfactual: Explore alternatives
Conceptual: Examine theories
True-false: Verify understanding
Factual: Test recall
Open-ended: Encourage discussion
False-premise: Correct misconceptions
Edge-case: Test boundaries

⚙️ Configuration

Example configuration:

task_name: yourbench_y1
configurations:
  push_to_huggingface: true
  set_hf_repo_visibility: public
  hf_organization: your-org
  model:
    model_name: gpt-4
    model_type: openai
    max_concurrent_requests: 512

selected_choices:
  generate_dataset:
    execute: true
    files_directory: examples/data
    dataset_name: my_dataset

See Configuration Guide for detailed options.

🧰 Development

We use:

Ruff for code formatting and linting
pytest for testing

🤝 Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Install development dependencies
Make your changes
Run tests and ensure code style compliance
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

LiteLLM for model inference
Sentence Transformers for semantic embeddings
Hugging Face for dataset infrastructure

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
analysis		analysis
docs		docs
examples		examples
src/yourbench		src/yourbench
static/images		static/images
task_configs		task_configs
.env.template		.env.template
.gitignore		.gitignore
.python-version		.python-version
PREVIEW.md		PREVIEW.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤗 Yourbench

🌟 Overview

🔄 Process Flow

✨ Features

🛠️ Requirements

📦 Installation

🚀 Quick Start

📚 Documentation

🏗️ Pipeline Components

1. Dataset Generation

2. Document Summarization

3. Semantic Chunking

4. Multi-hop Chunk Creation

5. Question Generation

6. Dataset Management

🎯 Question Types

⚙️ Configuration

🧰 Development

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Releases

Packages

Contributors 2

Languages

sumukshashidhar/yourbench

Folders and files

Latest commit

History

Repository files navigation

🤗 Yourbench

🌟 Overview

🔄 Process Flow

✨ Features

🛠️ Requirements

📦 Installation

🚀 Quick Start

📚 Documentation

🏗️ Pipeline Components

1. Dataset Generation

2. Document Summarization

3. Semantic Chunking

4. Multi-hop Chunk Creation

5. Question Generation

6. Dataset Management

🎯 Question Types

⚙️ Configuration

🧰 Development

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages