GitHub - athina-ai/ariadne: LLM Evals for Text Summarization and RAG use-cases.

⚠️ THIS REPOSITORY HAS MOVED ⚠️

The latest version of this repository is now at https://github.com/athina-ai/athina-evals

Overview

Ariadne AI is an open-source library for evaluating text summarization and retireved-augmented-generation (RAG) chatbots without the necessity for human-annotated reference summaries. Each evaluator is paired with an explanation that help developers evaluate their LLMs and detect the reason for their failure cases. Our approach leverages LLM's reasoning to provide explanations for the failures.

Installation

pip install ariadne-ai

or

 poetry run python run_experiment_rag.py
 poetry run python run_experiment_summarization.py

or

see the provided notebook examples

Text Summarization

For text summarization, a question-answer generation (QAG) framework has been developed, which allows us to pinpoint failure cases in production without human-annotated reference summaries. Here is a breakdown of our approach:

Question Generation: The LLM formulates closed-ended (Yes/No) questions drawing from both the summary and the main document.
Summary-based Answers: An LLM answerer generator responds to these questions using only the summary as a reference. The potential responses include "Yes," "No," and "Unknown."
Document-based Answers: Similarly, the LLM answerer generator answers the same set of questions, but this time, it references the primary document. Possible responses remain "Yes," "No," and "Unknown."
Evaluation Metrics: The evaluation metrics assessing the consistency between the summary-based and document-based summaries are computed to draw conclusions.

The following failure are detected based on the above approach:

Hallucination Failure: A hallucination failure occurs when a question gets a 'Yes/No' answer based on the summary but receives an 'Unknown' answer based on the original document.
Contradiction Failure: A contradiction failure is detected when at least one question is answered 'Yes' based on the summary, but 'No' when based on the full document, or vice-versa.
Non-informativeness Failure: A non-informativeness failure occurs when at least one question is answered as 'Unknown' based on the summary but a definitive 'Yes/No' based on the original document.

Retrived Augmented Generation (RAG)

Here is a breakdown of our approach:

Evaluation: The LLM determines whether a critirion is met, leveraging its reasoning
Explanation: The LLM explains the reasoning behind its decision, providing clarity regarding the failure cases.

The following failure cases are detected:

Faithfulness Failure: A faithfulness failure occurs if the response cannot be inferred purely from the context provided.
Context Relevance Failure: A context relevance failure (bad retrieval) occures if the user's query cannot be answered purely from the retrieved context.
Answer Relevance Failure: An answer relevacne failure occurs if the response does not answer the question.

Contribution

Please feel free to reach out to [email protected] if you would like to contribute. You could find more on how you could integrate the evaluations in your product here: https://docs.athina.ai.

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
ariadne_ai		ariadne_ai
data		data
docs		docs
examples		examples
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
references.md		references.md
run_experiment_example.py		run_experiment_example.py
run_experiment_rag.py		run_experiment_rag.py
run_experiment_summarization.py		run_experiment_summarization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚠️ THIS REPOSITORY HAS MOVED ⚠️

Overview

Installation

Text Summarization

Retrived Augmented Generation (RAG)

Contribution

License

About

Releases

Packages

Contributors 3

Languages

License

athina-ai/ariadne

Folders and files

Latest commit

History

Repository files navigation

⚠️ THIS REPOSITORY HAS MOVED ⚠️

Overview

Installation

Text Summarization

Retrived Augmented Generation (RAG)

Contribution

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages