An Exploratory Study on Multilingual Quality Estimation

Shuo Sun1, Marina Fomicheva2, Frédéric Blain3, Vishrav Chaudhary4, Ahmed El-Kishky5, Adithya Renduchintala1, Francisco Guzmán4, Lucia Specia6
1Johns Hopkins University, 2University of Sheffield, 3University of Wolverhampton, 4Facebook, 5Facebook AI, 6Imperial College London


Predicting the quality of machine translation has traditionally been addressed with language-specific models, under the assumption that the quality label distribution or linguistic features exhibit traits that are not shared across languages. An obvious disadvantage of this approach is the need for labelled data for each given language pair. We challenge this assumption by exploring different approaches to multilingual Quality Estimation (QE), including using scores from translation models. We show that these outperform single-language models, particularly in less balanced quality label distributions and low-resource settings. In the extreme case of zero-shot QE, we show that it is possible to accurately predict quality for any given new language from models trained on other languages. Our findings indicate that state-of-the-art neural QE models based on powerful pre-trained representations generalise well across languages, making them more applicable in real-world settings.