Skip to content

Latest commit

 

History

History
46 lines (40 loc) · 2.87 KB

ragas.md

File metadata and controls

46 lines (40 loc) · 2.87 KB

RAGAS

RAGAS is a python library that can be used to evaluate RAG (Retrieval Augmented Generative) pipelines using a variety of metrics. The library can also be used to generate synthetic data sets that can be used in this testing process (see this example based on the EIDC metadata descriptions).

Usage

The library is expected to be used using ChatGPT. If you have a ChatGPT login and access token, follow the instructions in the documentation. To run with a local LLM requires a few additional steps beyond what is described.

Fixing Nested Async Runner

The RAGAS library uses nest a synchronous threads to run which often cause problems in the event loop. To fix this issue, use the nest_asyncio library:

pip install nest-asyncio

then in your code before running RAGAS:

import nest_asyncio
nest_asyncio.apply()

Setting RunConfig

Additionally running against a local LLM can cause issues with the LLM/serivice being overwhelmed with too many requests. To avoid this create a RunConfig that limits the number of workers so your local LLM can process just one request at a time:

config = RunConfig(max_workers=1)

It may also be helpful to set the max_retries parameter on this config to move on sooner after failures.

This config can be handed to any RAGAS method that accepts the run_config parameter e.g.

TestsetGenerator.from_langchain(llm, llm, embeddings, run_config=RunConfig(max_workers=1, max_retries=1))

Choice of LLM

Whilst testing using various LLMs available on Ollama, it became apparent some do a better job at returning responses in the specified format (something that is essential for RAGAS to run correctly). During testing one of the better models for ensuring properly formatted response was the mistral-nemo model. To use this model with RAGAS:

from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.chat_models import ChatOllama

llm = ChatOllama(model='mistral-nemo')
embeddings = OllamaEmbeddings(model='mistral-nemo'4)

It may also be necessary to increase the default context size when using Ollama based models using num_ctx e.g.

llm = ChatOllama(model='mistral-nemo'), num_ctx=16384

Notebooks

The ragas_synth.ipynb notebook can be run to generate a synthetic test set similar to that found in data/. The ragas_eval.ipynb can be run to generate a set of metrics based upon this synthetic test set and the response retrieved by a RAG pipeline (or any other kind of LLM response): ragas-eval The various evaluation metrics and how to interpret them are described here.