Skip to content

Ririkoo/TaU

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Jul 23, 2023
080b63b · Jul 23, 2023

History

1 Commit
Jul 23, 2023
Jul 23, 2023
Jul 23, 2023
Jul 23, 2023
Jul 23, 2023
Jul 23, 2023
Jul 23, 2023
Jul 23, 2023
Jul 23, 2023
Jul 23, 2023
Jul 23, 2023
Jul 23, 2023

Repository files navigation

TaU

TaU: Test-time Adaptation for Machine Translation Evaluation by Uncertainty Minimization

Paper Slides Poster

Overview

Prerequisites

This work cannot be done without the amazing code from COMET and mt-metrics-eval.

cd comet
pip install poetry
poetry install
git clone https://github.com/google-research/mt-metrics-eval.git # Our version: bdda529ce4fae9
cd mt-metrics-eval
pip install .
alias mtme='python3 -m mt_metrics_eval.mtme'
mtme --download 

Test-time Adaptation

  • Run with configurations

    1. Create the experiment script:
    cd tau/
    python create_tau_exps.py --config_file configs/{comet-da/comet-mqm/comet-qe-mqm}.yaml
    1. Run the generated script:
    cd tau/
    sh run_{comet-da/comet-mqm/comet-qe-mqm}.sh
  • CLI Usage Example:

    MTME_DATA=${HOME}/.mt-metrics-eval/mt-metrics-eval-v2/
    SAVE=results
    SYSTEM=Online-W.txt
    python tau.py \
        -s ${MTME_DATA}/wmt21.tedtalks/sources/en-de.txt \
        -r ${TASK_DATA}/wmt21.tedtalks/references/en-de.refA.txt -t ${SYSTEM} \
        --to_json ${SAVE}/${SYSTEM}.json \
        --lr 1e-4 --mc_dropout 30 --component ln --adapt-epoch 1 --batch_size 16 --quiet

Meta-evaluation

  • Recommendation: Please use the "unit test" to ensure that the meta-evaluation script can produce the baseline result reported in the WMT official report. To do so, please check the following function in tau/meta_eval_results.py:":
    def verify():
      ...
      file = "" # Please replace the path with your MTME data path
      ...
  • If you use the generated script to run the experiments, you can evaluate the correlation performance with:
    cd tau/
    python meta_eval_results.py [comet-da-results/wmt21.tedtalks/en-de] # An example

Supplementary

Here is a valueable question after the publication: How do TaU perform in the low-resource languages and other benchmarks without tuning?

  • We recognize and value language diversity. However, the out-of-domain benchmark for MT evaluation is scarce. Thus, we conducted further experiments on the previous WMT20-News benchmark with the same learning rate (without tuning on developmental data), and the results are as follows:

    Pl-En Ru-En Ta-En Zh-En En-Pl En-Ru En-Ta En-Zh
    COMET-DA 34.5 83.6 76.4 93.1 80.0 92.5 79.8 0.7
    +TaU 34.6 84.0 77.4 93.4 79.0 91.6 75.3 1.2

Environment (For reference)

Citation

@inproceedings{zhan-etal-2023-test,
    title = "Test-time Adaptation for Machine Translation Evaluation by Uncertainty Minimization",
    author = "Zhan, Runzhe  and
      Liu, Xuebo  and
      Wong, Derek F.  and
      Zhang, Cuilian  and
      Chao, Lidia S.  and
      Zhang, Min",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.47",
    pages = "807--820",
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages