Cargando…

An error analysis for image-based multi-modal neural machine translation

In this article, we conduct an extensive quantitative error analysis of different multi-modal neural machine translation (MNMT) models which integrate visual features into different parts of both the encoder and the decoder. We investigate the scenario where models are trained on an in-domain traini...

Descripción completa

Detalles Bibliográficos
Autores principales: Calixto, Iacer, Liu, Qun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Netherlands 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6579783/
https://www.ncbi.nlm.nih.gov/pubmed/31281206
http://dx.doi.org/10.1007/s10590-019-09226-9
_version_ 1783427901339729920
author Calixto, Iacer
Liu, Qun
author_facet Calixto, Iacer
Liu, Qun
author_sort Calixto, Iacer
collection PubMed
description In this article, we conduct an extensive quantitative error analysis of different multi-modal neural machine translation (MNMT) models which integrate visual features into different parts of both the encoder and the decoder. We investigate the scenario where models are trained on an in-domain training data set of parallel sentence pairs with images. We analyse two different types of MNMT models, that use global and local image features: the latter encode an image globally, i.e. there is one feature vector representing an entire image, whereas the former encode spatial information, i.e. there are multiple feature vectors, each encoding different portions of the image. We conduct an error analysis of translations generated by different MNMT models as well as text-only baselines, where we study how multi-modal models compare when translating both visual and non-visual terms. In general, we find that the additional multi-modal signals consistently improve translations, even more so when using simpler MNMT models that use global visual features. We also find that not only translations of terms with a strong visual connotation are improved, but almost all kinds of errors decreased when using multi-modal models.
format Online
Article
Text
id pubmed-6579783
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Springer Netherlands
record_format MEDLINE/PubMed
spelling pubmed-65797832019-07-03 An error analysis for image-based multi-modal neural machine translation Calixto, Iacer Liu, Qun Mach Transl Article In this article, we conduct an extensive quantitative error analysis of different multi-modal neural machine translation (MNMT) models which integrate visual features into different parts of both the encoder and the decoder. We investigate the scenario where models are trained on an in-domain training data set of parallel sentence pairs with images. We analyse two different types of MNMT models, that use global and local image features: the latter encode an image globally, i.e. there is one feature vector representing an entire image, whereas the former encode spatial information, i.e. there are multiple feature vectors, each encoding different portions of the image. We conduct an error analysis of translations generated by different MNMT models as well as text-only baselines, where we study how multi-modal models compare when translating both visual and non-visual terms. In general, we find that the additional multi-modal signals consistently improve translations, even more so when using simpler MNMT models that use global visual features. We also find that not only translations of terms with a strong visual connotation are improved, but almost all kinds of errors decreased when using multi-modal models. Springer Netherlands 2019-04-08 2019 /pmc/articles/PMC6579783/ /pubmed/31281206 http://dx.doi.org/10.1007/s10590-019-09226-9 Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Article
Calixto, Iacer
Liu, Qun
An error analysis for image-based multi-modal neural machine translation
title An error analysis for image-based multi-modal neural machine translation
title_full An error analysis for image-based multi-modal neural machine translation
title_fullStr An error analysis for image-based multi-modal neural machine translation
title_full_unstemmed An error analysis for image-based multi-modal neural machine translation
title_short An error analysis for image-based multi-modal neural machine translation
title_sort error analysis for image-based multi-modal neural machine translation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6579783/
https://www.ncbi.nlm.nih.gov/pubmed/31281206
http://dx.doi.org/10.1007/s10590-019-09226-9
work_keys_str_mv AT calixtoiacer anerroranalysisforimagebasedmultimodalneuralmachinetranslation
AT liuqun anerroranalysisforimagebasedmultimodalneuralmachinetranslation
AT calixtoiacer erroranalysisforimagebasedmultimodalneuralmachinetranslation
AT liuqun erroranalysisforimagebasedmultimodalneuralmachinetranslation