Summary
This is the repo associated with the paper Sentiment-based Candidate Selection for NMT, co-written by me (Alex Jones) and my supervisor Derry Wijaya. The paper describes a decoder-side approach for selecting the translation candidate that best preserves the automatically-socred sentiment of the source text. To this end, we train three distinct sentiment classifiers: an English BERT model, a Spanish XLM-RoBERTa model, and an XLM-RoBERTa model fine-tuned on English but used for sentiment classification in other languages, such as French, Finnish, and Indonesian. We compute a softmax over the logits returned by these classifiers to obtain the probability of a text being in the positive class, and call this number the "sentiment score":
We then generate translation candidates using beam search and select the candidate whose sentiment score differs least from that of :
We conduct human evaluations on English-Spanish and English-Indonesian translations with proficient bilingual speakers and report the results in our paper. We also provide examples of tweets translated using this method in the Discussion and the Appendix.
Dependencies
PyTorch
Transformers
Scikit-learn
SciPy
BeautifulSoup (for text preprocessing)
Numpy
Pandas
Sentiment Classification
We construct sentiment classifiers by fine-tuning on labeled sentiment data in English and Spanish separately. The English-only sentiment classifier is constructed using BERT; the notebook for training is available here and is based on the BERT fine-tuning tutorial by Chris McCormick and Nick Ryan (as are all the notebooks we used for training our sentiment classifiers—citations are provided in-notebook). We also fine-tune XLM-RoBERTa using annotated Spanish data, and then again using the English sentiment data. The sentiment models themselves (the PyTorch files containing the parameters) are available here, and the annotated sentiment data is available at the following links:
MT
We perform machine translation using the open-source Helsinki-NLP/OPUS-MT models, which offers pretrained models for easy usage here. We opted for this system because we were easily able to generate n-best lists and incorporate sentiment-based selection into the decoding step. Because we used pretrained models, we don't perform any of our own training, but these notebooks show how we integrate sentiment scoring into the translation selection process. Another advantage of the Helsinki-NLP models was the wide variety of supported languages, which we wielded to our advantage in trying our approach on many different languages (see the Appendix of our paper for concrete examples).
Experimental Materials
In human evaluations of the translations, we asked participants to grade translations based on both their accuracy (broadly speaking) and their level of sentiment divergence, and also asked them to provide reasons why they thought the sentiment of the source text differed from that of the translation, if applicable. We performed both an English-Spanish and English-Indonesian evaluation. See the following files for reference:
- The translations that were evaluated
- Source texts (English tweets) deemed to be particularly "idiomatic"
- The evaluation templates themselves
- The notebooks we used in analyzing the results of the human evaluations
License
Citation
Please cite our paper if you use any of the resources in this repo for your research:
@inproceedings{jones-wijaya-2021-sentiment,
title = "Sentiment-based Candidate Selection for {NMT}",
author = "Jones, Alexander G and
Wijaya, Derry",
booktitle = "Proceedings of the 18th Biennial Machine Translation Summit (Volume 1: Research Track)",
month = aug,
year = "2021",
address = "Virtual",
publisher = "Association for Machine Translation in the Americas",
url = "https://aclanthology.org/2021.mtsummit-research.16",
pages = "188--201"}