Model | Description | Dataset | Download |
---|---|---|---|
conv.wmt14.en-fr |
Convolutional (Gehring et al., 2017) |
WMT14 English-French | model: download (.tar.bz2) newstest2014: download (.tar.bz2) newstest2012/2013: download (.tar.bz2) |
conv.wmt14.en-de |
Convolutional (Gehring et al., 2017) |
WMT14 English-German | model: download (.tar.bz2) newstest2014: download (.tar.bz2) |
conv.wmt17.en-de |
Convolutional (Gehring et al., 2017) |
WMT17 English-German | model: download (.tar.bz2) newstest2014: download (.tar.bz2) |
transformer.wmt14.en-fr |
Transformer (Ott et al., 2018) |
WMT14 English-French | model: download (.tar.bz2) newstest2014: download (.tar.bz2) |
transformer.wmt16.en-de |
Transformer (Ott et al., 2018) |
WMT16 English-German | model: download (.tar.bz2) newstest2014: download (.tar.bz2) |
transformer.wmt18.en-de |
Transformer (Edunov et al., 2018) WMT'18 winner |
WMT'18 English-German | model: download (.tar.gz) See NOTE in the archive |
transformer.wmt19.en-de |
Transformer (Ng et al., 2019) WMT'19 winner |
WMT'19 English-German | model: download (.tar.gz) |
transformer.wmt19.de-en |
Transformer (Ng et al., 2019) WMT'19 winner |
WMT'19 German-English | model: download (.tar.gz) |
transformer.wmt19.en-ru |
Transformer (Ng et al., 2019) WMT'19 winner |
WMT'19 English-Russian | model: download (.tar.gz) |
transformer.wmt19.ru-en |
Transformer (Ng et al., 2019) WMT'19 winner |
WMT'19 Russian-English | model: download (.tar.gz) |
Interactive translation via PyTorch Hub:
import torch
# List available models
torch.hub.list('pytorch/fairseq') # [..., 'transformer.wmt16.en-de', ... ]
# Load a transformer trained on WMT'16 En-De
en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt16.en-de', tokenizer='moses', bpe='subword_nmt')
# The underlying model is available under the *models* attribute
assert isinstance(en2de.models[0], fairseq.models.transformer.TransformerModel)
# Translate a sentence
en2de.translate('Hello world!')
# 'Hallo Welt!'
Generation with the binarized test sets can be run in batch mode as follows, e.g. for WMT 2014 English-French on a GTX-1080ti:
mkdir -p data-bin
curl https://dl.fbaipublicfiles.com/fairseq/models/wmt14.v2.en-fr.fconv-py.tar.bz2 | tar xvjf - -C data-bin
curl https://dl.fbaipublicfiles.com/fairseq/data/wmt14.v2.en-fr.newstest2014.tar.bz2 | tar xvjf - -C data-bin
fairseq-generate data-bin/wmt14.en-fr.newstest2014 \
--path data-bin/wmt14.en-fr.fconv-py/model.pt \
--beam 5 --batch-size 128 --remove-bpe | tee /tmp/gen.out
# ...
# | Translated 3003 sentences (96311 tokens) in 166.0s (580.04 tokens/s)
# | Generate test with beam=5: BLEU4 = 40.83, 67.5/46.9/34.4/25.5 (BP=1.000, ratio=1.006, syslen=83262, reflen=82787)
# Compute BLEU score
grep ^H /tmp/gen.out | cut -f3- > /tmp/gen.out.sys
grep ^T /tmp/gen.out | cut -f2- > /tmp/gen.out.ref
fairseq-score --sys /tmp/gen.out.sys --ref /tmp/gen.out.ref
# BLEU4 = 40.83, 67.5/46.9/34.4/25.5 (BP=1.000, ratio=1.006, syslen=83262, reflen=82787)
These scripts provide an example of pre-processing data for the NMT task.
Provides an example of pre-processing for IWSLT'14 German to English translation task: "Report on the 11th IWSLT evaluation campaign" by Cettolo et al.
Example usage:
cd examples/translation/
bash prepare-iwslt14.sh
cd ../..
# Binarize the dataset:
TEXT=examples/translation/iwslt14.tokenized.de-en
fairseq-preprocess --source-lang de --target-lang en \
--trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
--destdir data-bin/iwslt14.tokenized.de-en
# Train the model (better for a single GPU setup):
mkdir -p checkpoints/fconv
CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/iwslt14.tokenized.de-en \
--lr 0.25 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 \
--criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
--lr-scheduler fixed --force-anneal 200 \
--arch fconv_iwslt_de_en --save-dir checkpoints/fconv
# Generate:
fairseq-generate data-bin/iwslt14.tokenized.de-en \
--path checkpoints/fconv/checkpoint_best.pt \
--batch-size 128 --beam 5 --remove-bpe
To train transformer model on IWSLT'14 German to English:
# Preparation steps are the same as for fconv model.
# Train the model (better for a single GPU setup):
mkdir -p checkpoints/transformer
CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/iwslt14.tokenized.de-en \
-a transformer_iwslt_de_en --optimizer adam --lr 0.0005 -s de -t en \
--label-smoothing 0.1 --dropout 0.3 --max-tokens 4000 \
--min-lr '1e-09' --lr-scheduler inverse_sqrt --weight-decay 0.0001 \
--criterion label_smoothed_cross_entropy --max-update 50000 \
--warmup-updates 4000 --warmup-init-lr '1e-07' \
--adam-betas '(0.9, 0.98)' --save-dir checkpoints/transformer
# Average 10 latest checkpoints:
python scripts/average_checkpoints.py --inputs checkpoints/transformer \
--num-epoch-checkpoints 10 --output checkpoints/transformer/model.pt
# Generate:
fairseq-generate data-bin/iwslt14.tokenized.de-en \
--path checkpoints/transformer/model.pt \
--batch-size 128 --beam 5 --remove-bpe
The WMT English to German dataset can be preprocessed using the prepare-wmt14en2de.sh
script.
By default it will produce a dataset that was modeled after "Attention Is All You Need" (Vaswani et al., 2017), but with news-commentary-v12 data from WMT'17.
To use only data available in WMT'14 or to replicate results obtained in the original "Convolutional Sequence to Sequence Learning" (Gehring et al., 2017) paper, please use the --icml17
option.
bash prepare-wmt14en2de.sh --icml17
Example usage:
cd examples/translation/
bash prepare-wmt14en2de.sh
cd ../..
# Binarize the dataset:
TEXT=examples/translation/wmt17_en_de
fairseq-preprocess --source-lang en --target-lang de \
--trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
--destdir data-bin/wmt17_en_de --thresholdtgt 0 --thresholdsrc 0
# Train the model:
# If it runs out of memory, try to set --max-tokens 1500 instead
mkdir -p checkpoints/fconv_wmt_en_de
fairseq-train data-bin/wmt17_en_de \
--lr 0.5 --clip-norm 0.1 --dropout 0.2 --max-tokens 4000 \
--criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
--lr-scheduler fixed --force-anneal 50 \
--arch fconv_wmt_en_de --save-dir checkpoints/fconv_wmt_en_de
# Generate:
fairseq-generate data-bin/wmt17_en_de \
--path checkpoints/fconv_wmt_en_de/checkpoint_best.pt --beam 5 --remove-bpe
Provides an example of pre-processing for the WMT'14 English to French translation task.
Example usage:
cd examples/translation/
bash prepare-wmt14en2fr.sh
cd ../..
# Binarize the dataset:
TEXT=examples/translation/wmt14_en_fr
fairseq-preprocess --source-lang en --target-lang fr \
--trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
--destdir data-bin/wmt14_en_fr --thresholdtgt 0 --thresholdsrc 0
# Train the model:
# If it runs out of memory, try to set --max-tokens 1000 instead
mkdir -p checkpoints/fconv_wmt_en_fr
fairseq-train data-bin/wmt14_en_fr \
--lr 0.5 --clip-norm 0.1 --dropout 0.1 --max-tokens 3000 \
--criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
--lr-scheduler fixed --force-anneal 50 \
--arch fconv_wmt_en_fr --save-dir checkpoints/fconv_wmt_en_fr
# Generate:
fairseq-generate data-bin/fconv_wmt_en_fr \
--path checkpoints/fconv_wmt_en_fr/checkpoint_best.pt --beam 5 --remove-bpe
We also support training multilingual translation models. In this example we'll
train a multilingual {de,fr}-en
translation model using the IWSLT'17 datasets.
Note that we use slightly different preprocessing here than for the IWSLT'14 En-De data above. In particular we learn a joint BPE code for all three languages and use interactive.py and sacrebleu for scoring the test set.
# First install sacrebleu and sentencepiece
pip install sacrebleu sentencepiece
# Then download and preprocess the data
cd examples/translation/
bash prepare-iwslt17-multilingual.sh
cd ../..
# Binarize the de-en dataset
TEXT=examples/translation/iwslt17.de_fr.en.bpe16k
fairseq-preprocess --source-lang de --target-lang en \
--trainpref $TEXT/train.bpe.de-en --validpref $TEXT/valid.bpe.de-en \
--joined-dictionary \
--destdir data-bin/iwslt17.de_fr.en.bpe16k \
--workers 10
# Binarize the fr-en dataset
# NOTE: it's important to reuse the en dictionary from the previous step
fairseq-preprocess --source-lang fr --target-lang en \
--trainpref $TEXT/train.bpe.fr-en --validpref $TEXT/valid.bpe.fr-en \
--joined-dictionary --tgtdict data-bin/iwslt17.de_fr.en.bpe16k/dict.en.txt \
--destdir data-bin/iwslt17.de_fr.en.bpe16k \
--workers 10
# Train a multilingual transformer model
# NOTE: the command below assumes 1 GPU, but accumulates gradients from
# 8 fwd/bwd passes to simulate training on 8 GPUs
mkdir -p checkpoints/multilingual_transformer
CUDA_VISIBLE_DEVICES=0 fairseq-train data-bin/iwslt17.de_fr.en.bpe16k/ \
--max-epoch 50 \
--ddp-backend=no_c10d \
--task multilingual_translation --lang-pairs de-en,fr-en \
--arch multilingual_transformer_iwslt_de_en \
--share-decoders --share-decoder-input-output-embed \
--optimizer adam --adam-betas '(0.9, 0.98)' \
--lr 0.0005 --lr-scheduler inverse_sqrt --min-lr '1e-09' \
--warmup-updates 4000 --warmup-init-lr '1e-07' \
--label-smoothing 0.1 --criterion label_smoothed_cross_entropy \
--dropout 0.3 --weight-decay 0.0001 \
--save-dir checkpoints/multilingual_transformer \
--max-tokens 4000 \
--update-freq 8
# Generate and score the test set with sacrebleu
SRC=de
sacrebleu --test-set iwslt17 --language-pair ${SRC}-en --echo src \
| python scripts/spm_encode.py --model examples/translation/iwslt17.de_fr.en.bpe16k/sentencepiece.bpe.model \
> iwslt17.test.${SRC}-en.${SRC}.bpe
cat iwslt17.test.${SRC}-en.${SRC}.bpe \
| fairseq-interactive data-bin/iwslt17.de_fr.en.bpe16k/ \
--task multilingual_translation --source-lang ${SRC} --target-lang en \
--path checkpoints/multilingual_transformer/checkpoint_best.pt \
--buffer 2000 --batch-size 128 \
--beam 5 --remove-bpe=sentencepiece \
> iwslt17.test.${SRC}-en.en.sys
grep ^H iwslt17.test.${SRC}-en.en.sys | cut -f3 \
| sacrebleu --test-set iwslt17 --language-pair ${SRC}-en
During inference it is required to specify a single --source-lang
and
--target-lang
, which indicates the inference langauge direction.
--lang-pairs
, --encoder-langtok
, --decoder-langtok
have to be set to
the same value as training.