MiTTenS: A Dataset for Evaluating Gender Mistranslation

Translation systems, including foundation models capable of translation, can produce errors that result in gender mistranslation, and such errors can be especially harmful. To measure the extent of such potential harms when translating into and out of English, we introduce a dataset, MiTTenS, covering 26 languages from a variety of language families and scripts, including several traditionally under-represented in digital resources. The dataset is constructed with handcrafted passages that target known failure patterns, longer synthetically generated passages, and natural passages sourced from multiple domains. We demonstrate the usefulness of the dataset by evaluating both neural machine translation systems and foundation models, and show that all systems exhibit gender mistranslation and potential harm, even in high resource languages.

GitHub repository

This repository contains the data card PDF, and dataset CSV.

arxiv

https://arxiv.org/abs/2401.06935

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
README.md		README.md
mittens_datacard_20241003.pdf		mittens_datacard_20241003.pdf
mittens_dataset_v5.csv		mittens_dataset_v5.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MiTTenS: A Dataset for Evaluating Gender Mistranslation

GitHub repository

arxiv

About

Releases

Packages

License

google-research-datasets/mittens

Folders and files

Latest commit

History

Repository files navigation

MiTTenS: A Dataset for Evaluating Gender Mistranslation

GitHub repository

arxiv

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages