evaluation-framework

Star

Here are 127 public repositories matching this topic...

EleutherAI / lm-evaluation-harness

Star

A framework for few-shot evaluation of language models.

transformer language-model evaluation-framework

Updated Jul 3, 2024
Python

MaurizioFD / RecSys2019_DeepLearning_Evaluation

Star

This is the repository of our article published in RecSys 2019 "Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches" and of several follow-up studies.

deep-learning neural-network reproducible-research collaborative-filtering matrix-factorization hyperparameters bpr recommendation-system recommender-system reproducibility recommendation-algorithms knn matrix-completion evaluation-framework content-based-recommendation hybrid-recommender-system funksvd bprmf bprslim slimelasticnet

Updated May 25, 2023
Python

promptfoo / promptfoo

Star

Test your prompts, agents, and RAGs. Use LLM evals to improve your app's quality and catch problems. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

testing ci evaluation ci-cd cicd prompts evaluation-framework rag llm prompt-engineering llmops prompt-testing llm-eval llm-evaluation llm-evaluation-framework

Updated Jul 4, 2024
TypeScript

Giskard-AI / giskard

Sponsor

Star

🐢 Open-Source Evaluation & Testing for LLMs and ML models

Updated Jul 3, 2024
Python

confident-ai / deepeval

Star

The LLM Evaluation Framework

evaluation-metrics evaluation-framework llm-evaluation llm-evaluation-framework llm-evaluation-metrics

Updated Jul 4, 2024
Python

huggingface / lighteval

Star

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

evaluation evaluation-metrics evaluation-framework huggingface

Updated Jul 4, 2024
Python

srcclr / efda

Star

Evaluation Framework for Dependency Analysis (EFDA)

dependency-analysis languages evaluation-framework

Updated May 4, 2022
C

SpikeInterface / spiketoolkit

Star

Python-based tools for pre-, post-processing, validating, and curating spike sorting datasets.

neuroscience electrophysiology spike-sorting evaluation-framework

Updated Jan 14, 2022
Python

Borda / BIRL

Sponsor

Star

BIRL: Benchmark on Image Registration methods with Landmark validations

benchmark docker-image dataset medical-imaging landmarks image-registration evaluation-framework pathology-image anhir image-pair registration-methods cima registration-performances registration-benchmark

Updated Jan 4, 2022
Python

TonicAI / tonic_validate

Star

Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.

evaluation-metrics evaluation-framework rag large-language-models llm llms llmops retrieval-augmented-generation

Updated Jul 1, 2024
Python

bijington / expressive

Sponsor

Star

Expressive is a cross-platform expression parsing and evaluation framework. The cross-platform nature is achieved through compiling for .NET Standard so it will run on practically any platform.

xamarin parsing cross-platform evaluation netstandard expression-parser expression-evaluator hacktoberfest evaluation-framework