Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse (ICLR 2025, Oral, top 1.8%)

📣 4/2/25: We have updated our repo structure to hopefully be more user friendly!

📣 31/1/25: We have open-sourced the Trust-Aligned models here!

📣 22/1/25: This paper has been accepted to ICLR 2025!

This repository contains the original implementation of Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse (accepted at ICLR 2025). There are two parts to this repository:

Trust-Align: A preference dataset and framework that aligns LLMs to be more trustworthy, as measured by higher Trust-Score.
Trust-Eval: A framework to evaluate the trustworthiness of inline-cited outputs generated by large language models (LLMs) within the Retrieval-Augmented Generation (RAG) setting.

Paper abstract:

LLMs are an integral part of retrieval-augmented generation (RAG) systems. While many studies focus on evaluating the quality of end-to-end RAG systems, there is a lack of research on understanding the appropriateness of an LLM for the RAG task. Thus, we introduce a new metric, Trust-Score, that provides a holistic evaluation of the trustworthiness of LLMs in an RAG framework. We show that various prompting methods, such as in-context learning, fail to adapt LLMs effectively to the RAG task. Thus, we propose Trust-Align, a framework to align LLMs for higher Trust-Score. LLaMA-3-8b, aligned with our method, significantly outperforms open-source LLMs of comparable sizes on ASQA (↑10.7), QAMPARI (↑29.2), and ELI5 (↑14.9).

Data

The evaluation dataset used in Trust-Eval is available on Trust-Align Huggingface.

The SFT and DPO training dataset used in Trust-Align is also available Trust-Align Huggingface.

Trust-Eval

Trust-Eval quantifies trustworthiness on three main axis using Trust-Score:

Response Correctness: Correctness of the generated claims
Attribution Quality: Quality of citations generated. Concerns the recall (Are generated statements well-supported by the set citations?) and precision (Are the citations relevant to the statements?) of citations.
Refusal Groundedness: Ability of the model to discern if the question can be answered given the documents

We release Trust-Eval as a standalone package. You can install by following the steps below:

Set up a Python environment

conda create -n trust_eval python=3.10.13
conda activate trust_eval

Install dependencies
```
pip install trust_eval
```
Note: that vLLM will be installed with CUDA 12.1. Please ensure your CUDA setup is compatible.
Set up NLTK
```
import nltk
nltk.download('punkt_tab')
```

Please refer to Trust-Eval README for more information.

Trust-Align

Set up

conda create -n cite python=3.10.13
conda activate cite
pip install -r requirements.txt

We use the latest version of alignment-handbook for training (ver alignment-handbook-0.4.0.dev0). We followed the installation instructions on alignment-handbook repository:

git clone https://github.com/huggingface/alignment-handbook.git
cd ./alignment-handbook/
python -m pip install .

Please refer to Trust-Align README for more information.

Bug or Questions?

If you have any questions related to the code or the paper, feel free to email Shang Hong ([email protected]). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!

Citation

If you find our code, data, models, or the paper useful, please cite the paper:

@misc{song2024measuringenhancingtrustworthinessllms,
      title={Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse}, 
      author={Maojia Song and Shang Hong Sim and Rishabh Bhardwaj and Hai Leong Chieu and Navonil Majumder and Soujanya Poria},
      year={2024},
      eprint={2409.11242},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2409.11242}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
assets		assets
trust_align		trust_align
trust_eval		trust_eval
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse (ICLR 2025, Oral, top 1.8%)

Data

Trust-Eval

Trust-Align

Set up

Bug or Questions?

Citation

About

Releases

Packages

Contributors 2

Languages

declare-lab/trust-align

Folders and files

Latest commit

History

Repository files navigation

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse (ICLR 2025, Oral, top 1.8%)

Data

Trust-Eval

Trust-Align

Set up

Bug or Questions?

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages