A framework for few-shot evaluation of language models.
-
Updated
Jul 3, 2024 - Python
A framework for few-shot evaluation of language models.
This is the repository of our article published in RecSys 2019 "Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches" and of several follow-up studies.
Test your prompts, agents, and RAGs. Use LLM evals to improve your app's quality and catch problems. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
🐢 Open-Source Evaluation & Testing for LLMs and ML models
The LLM Evaluation Framework
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
Evaluation Framework for Dependency Analysis (EFDA)
Python-based tools for pre-, post-processing, validating, and curating spike sorting datasets.
BIRL: Benchmark on Image Registration methods with Landmark validations
Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.
Expressive is a cross-platform expression parsing and evaluation framework. The cross-platform nature is achieved through compiling for .NET Standard so it will run on practically any platform.
The official evaluation suite and dynamic data release for MixEval.
Optical Flow Dataset and Benchmark for Visual Crowd Analysis
Open-Source Evaluation for LLM Application Pipelines
Evaluate your biometric verification models literally in seconds.
PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection
LiDAR SLAM comparison and evaluation framework
Moonshot - A simple and modular tool to evaluate and red-team any LLM application.
Evaluation suite for large-scale language models.
Multilingual Large Language Models Evaluation Benchmark
Add a description, image, and links to the evaluation-framework topic page so that developers can more easily learn about it.
To associate your repository with the evaluation-framework topic, visit your repo's landing page and select "manage topics."