Improving Summarization with Human Edits

Abstract

Recent work has highlighted the potential of learning paradigms incorporating human feedback to generate high-quality text. While most approaches leverage human preferences to fine-tune large language models (LLMs) for abstractive summarization, this work explores a less-studied form of feedback: Human Edits.

We introduce Sequence Alignment (un)Likelihood Training (SALT), a novel training technique that integrates human-edited and model-generated data into the training loop. To address the scarcity of human-edited data, we propose Imitation Edits, where ground truth summaries from training data simulate the editing process. These edits, combined with model-generated summaries, reduce the need for costly human feedback.

Through experiments, we extend human feedback exploration to the medical domain summarization task. Our results demonstrate that SALT improves summary quality and outperforms the conventional Direct Preference Optimization (DPO) method when applied to human-edited data. We hope this work inspires further research into scalable, effective ways to incorporate human feedback for text summarization.

Yao, Zonghai, Benjamin Schloss, and Sai Selvaraj. "Improving Summarization with Human Edits." Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.

What is SALT?

Sequence Alignment (un)Likelihood Training (SALT) is a training framework designed to improve text summarization models by leveraging both human edits and model-generated outputs. SALT encourages models to align with high-quality, human-edited summaries while disfavoring undesirable outputs.

Key components of SALT include:

Human Edits: Real-world edits applied to model-generated summaries.
Imitation Edits: Simulated edits derived from ground truth summaries to reduce dependency on expensive human feedback.
Likelihood and Unlikelihood Training: A dual approach that encourages desirable edits while penalizing undesirable ones.

What is DPO?

Direct Preference Optimization (DPO) is a method that fine-tunes LLMs based on human preference scores. DPO focuses on aligning the model's output distribution with human-preferred samples (chosen responses) and away from less-preferred ones (rejected responses). While effective for general human preference feedback, our experiments show that SALT achieves better performance on human-edited data.

Setup and Installation

Install dependencies with Poetry:
```
poetry install
```
Add your Hugging Face authentication token to the project by creating an hg_secret file.
Download spaCy's English language model:
```
python -m spacy download en_core_web_sm
```

If you encounter compatibility issues with Poetry and PyTorch (Python 3.10), run the following command:

poetry run pip install torch==2.1.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Training

Dummy Dataset: This repo uses the dummy example dataset in data/avs/sub_eval_w_simulated_edits
Run the following scripts to train models:
Train DPO:
```
python train_DPO.py
```
Train SALT:
```
python train_SALT.py
```
Train SFT (Supervised Fine-Tuning):
```
python train_SFT.py
```
Simulated dataset can be found in https://huggingface.co/datasets/PrabhakarSai/after_visit_summary_simulated_edits
- refer the dataset card for instructions on using the dataset for this repo

Metrics

The following metrics are used to evaluate summarization quality:

ROUGE: Standard metric for summarization quality.
ConceptRouge: Evaluates the inclusion of domain-specific concepts.
- Implementation: Refer to AutomaticConceptEval in utils/metrics.py.
- Setup: Install quickumls and set up its API endpoint at http://localhost:8123/quickumls.
SAGE: A novel metric introduced in our paper to evaluate summary quality.
- Implementation: Refer to cal_SAGE in utils/metrics.py.

Example Setting

In our training framework, we define:

Chosen Sentences: Sentences preferred based on human edits.
Rejected Sentences: Sentences identified as suboptimal by human edits.
Edit Simulation: Ground truth summaries are transformed to simulate human edits, reducing reliance on human feedback.

During training, SALT optimizes the likelihood of chosen sentences while penalizing the unlikelihood of rejected sentences, guiding the model toward generating high-quality summaries.

HG dataset

Simulated dataset can be found in https://huggingface.co/datasets/PrabhakarSai/after_visit_summary_simulated_edits
- refer the dataset card for instructions on using the dataset for this repo

Citation

If SALT or this repository is useful in your research, please cite our work:

@inproceedings{yao2023improving,
  title={Improving Summarization with Human Edits},
  author={Yao, Zonghai and Schloss, Benjamin and Selvaraj, Sai},
  booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
  pages={2604--2620},
  year={2023}
}

Name	Name	Last commit message	Last commit date
Latest commit saiprabhakar added hg dataset Jan 12, 2025 13d2c11 · Jan 12, 2025 History 12 Commits
data	data	bug fix and cleanup	Jan 5, 2025
sequence_alignment	sequence_alignment	refactoring	Jan 5, 2025
trainer	trainer	reformatting	Jan 12, 2025
utils	utils	reformatting	Jan 12, 2025
.gitignore	.gitignore	bug fix and cleanup	Jan 5, 2025
README.md	README.md	added hg dataset	Jan 12, 2025
poetry.lock	poetry.lock	bug fix and cleanup	Jan 5, 2025
poetry.toml	poetry.toml	cleaned up dpo salt sft trainers	Jan 3, 2025
pyproject.toml	pyproject.toml	bug fix and cleanup	Jan 5, 2025
simulate_edits.ipynb	simulate_edits.ipynb	refactoring	Jan 5, 2025
train_DPO.py	train_DPO.py	formatting and comments	Jan 12, 2025
train_SALT.py	train_SALT.py	formatting and comments	Jan 12, 2025
train_SFT.py	train_SFT.py	formatting and comments	Jan 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improving Summarization with Human Edits

Abstract

What is SALT?

What is DPO?

Setup and Installation

Training

Metrics

Example Setting

HG dataset

Citation

About

Releases

Packages

Languages

saiprabhakar/Summarization_DPO_SALT

Folders and files

Latest commit

History

Repository files navigation

Improving Summarization with Human Edits

Abstract

What is SALT?

What is DPO?

Setup and Installation

Training

Metrics

Example Setting

HG dataset

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages