Important:
- All Python scripts (
scripts/
,translation_models/
, and any script ending with .py.) are set up for using prompt-contrastive decoding with user message only.- For system prompt experiments, please use the appropriate notebook in the
notebooks/
directory.
This repository builds on and extends Sennrich et al. (EACL 2024)'s codebase to support Llama 3.1 models and prompt-contrastive decoding, as part of Xiaojing Zhang’s master’s thesis at the University of Zurich. The goal is to reduce omission errors in low-resource machine translation with large language models (LLMs) through prompt-based decoding techniques.
- Support for Llama 3.0 and 3.1, including chat-based prompting
- Implementation of prompt-contrastive decoding
- Adapted scripts and new notebooks for Colab compatibility
All core logic and the original implementation are credited to Sennrich et al. (EACL 2024). This fork was extended and maintained by Xiaojing Zhang for the master’s thesis at the University of Zurich.
This project is based on the source-contrastive and language-contrastive decoding framework as described in Sennrich et al. (EACL 2024):
- Source-contrastive decoding: Search for a translation that maximizes P(Y|X) - λ·P(Y|X'), where X' is a random source segment. This penalizes hallucinations.
- Language-contrastive decoding: Search for a translation that maximizes P(Y|X,l_y) - λ·P(Y|X,l_y'), where l_y is the language indicator for the desired target language, and l_y' the indicator for an undesired language (such as English or the source language). This penalizes off-target translations.
- Prompt-contrastive decoding (this work): Search for a translation that maximizes P(Y|X, p_pos) – λ·P(Y|X, p_neg), where p_pos is the positive prompt that encourages desired translation behavior and p_neg the negative prompt inducing undesired translation behavior such as omissions. This penalizes omissions.
Main modifications in this repository:
-
llama.py:
- Added pad token, set padding side, and defined EOS token IDs for Llama 3.1 models
- Used role-based chat template and tokenizer's
apply_chat_template
method for Llama 3.1 models - Removed the
PromptTemplate
class as the chat template now handles prompt formatting - Replaced pipeline use (preprocess, forward, and postprocess) as the chat template for Llama 3.1 is a list of dictionaries, but pipeline does not accept it
- Made changes to padding side and token, stacked padded tensors into a single batch tensor as input to the model
- Added a new parameter
is_prompt_contrastive
to handle contrastive prompts
-
init.py: Added Llama 3 and 3.1 models
-
prompts.py: New script to handle positive and negative prompts
-
mt_task.py: Updated the
evaluate
method to handle contrastive prompt pairs -
run.py: Added two arguments to handle contrastive prompt pairs
-
utils_run.py and utils_llama.py: updated language codes for FLORES+ dataset
annotations/
: Manual annotation files (error analysis, omissions)notebooks/
: Notebooks for demos and reproducing thesis resultsoutputs/
: Translation outputs and evaluation results generated in this thesispredictions/
: Original outputs from Sennrich et al. (EACL 2024), for comparison/referencescripts/
: Main experiment scripts and helper utilitiestests/
: Unit tests for core modules from Sennrich et al. (EACL 2024)translation_models/
: Model wrappers and utilities (Llama, m2m100, small100)illustration.png
,logo.png
: Visual assets for documentation/thesisLICENSE
,README.md
,requirements.txt
: Repository metadata and setup
python3 -m venv venv
for Linux/Mac
python -m venv venv
for Windows
source venv/bin/activate
for Linux/Macvenv\Scripts\activate
for Windows
pip install -r requirements.txt
-
For prompt-contrastive decoding with user message:
Use the Python scripts as described below. -
For prompt-contrastive decoding with system prompt:
Please run the relevant notebook innotebooks/
.
Example commands
Source-contrastive decoding with M2M-100 (418M) on Asturian–Croatian, with λ_src=0.7:
python -m scripts.run --model_path m2m100_418M --language_pairs ast-hr --source_contrastive --source_weight -0.7
Source-contrastive and language-contrastive decoding with SMaLL-100 on Pashto–Asturian, with 2 random source segments, λ_src=0.7, λ_lang=0.1, and English and Pashto as contrastive target languages:
python -m scripts.run --model_path small100 --language_pairs ps-ast --source_contrastive 2 --source_weight -0.7 --language_contrastive en ps --language_weight -0.1
Prompt-contrastive decoding with Llama 3.1 8B Instruct on Mongolian-English, with λ_prompt=0.1 and one contrastive prompt pair appended to user message:
python -m scripts.run --model_path llama-3.1-8b-instruct --language_pairs mn-en --prompt_contrastive --prompt_weight -0.1
Source-contrastive and prompt-contrastive decoding with Llama 3.1 8B Instruct on Igbo-English, with 1 random source segment, λ_src=0.7, λ_prompt=0.1:
python -m scripts.run --model_path llama-3.1-8b-instruct --language_pairs ig-en --source_contrastive --source_weight -0.7 --prompt_contrastive --prompt_weight -0.1
Or run the provided notebook for a full Colab demo.
- FLORES-101, as in original repo
- FLORES_Plus.
devtest
section is used for the evaluation.
Multiple models are implemented:
- M2M-100 (418M). Use
--model_path m2m100_418M
- SMaLL-100. Use
--model_path small100
- Llama 3.1 8B Instruct. Use
--model_path llama-3.1-8b-instruct
ChrF2:
sacrebleu ref.txt < output.txt --metrics chrf
spBLEU:
sacrebleu ref.txt < output.txt --tokenize flores101
MetricX-23-XL: Run the provided notebook.
@inproceedings{sennrich-etal-2024-mitigating,
title={Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding},
author={Rico Sennrich and Jannis Vamvas and Alireza Mohammadshahi},
booktitle={18th Conference of the European Chapter of the Association for Computational Linguistics},
year={2024}
}