Cargando…

Unsupervised Quality Estimation Model for English to German Translation and Its Application in Extensive Supervised Evaluation

With the rapid development of machine translation (MT), the MT evaluation becomes very important to timely tell us whether the MT system makes any progress. The conventional MT evaluation methods tend to calculate the similarity between hypothesis translations offered by automatic translation system...

Descripción completa

Detalles Bibliográficos
Autores principales: Han, Aaron L.-F., Wong, Derek F., Chao, Lidia S., He, Liangye, Lu, Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4032676/
https://www.ncbi.nlm.nih.gov/pubmed/24892086
http://dx.doi.org/10.1155/2014/760301
Descripción
Sumario:With the rapid development of machine translation (MT), the MT evaluation becomes very important to timely tell us whether the MT system makes any progress. The conventional MT evaluation methods tend to calculate the similarity between hypothesis translations offered by automatic translation systems and reference translations offered by professional translators. There are several weaknesses in existing evaluation metrics. Firstly, the designed incomprehensive factors result in language-bias problem, which means they perform well on some special language pairs but weak on other language pairs. Secondly, they tend to use no linguistic features or too many linguistic features, of which no usage of linguistic feature draws a lot of criticism from the linguists and too many linguistic features make the model weak in repeatability. Thirdly, the employed reference translations are very expensive and sometimes not available in the practice. In this paper, the authors propose an unsupervised MT evaluation metric using universal part-of-speech tagset without relying on reference translations. The authors also explore the performances of the designed metric on traditional supervised evaluation tasks. Both the supervised and unsupervised experiments show that the designed methods yield higher correlation scores with human judgments.