Cargando…
An error analysis for image-based multi-modal neural machine translation
In this article, we conduct an extensive quantitative error analysis of different multi-modal neural machine translation (MNMT) models which integrate visual features into different parts of both the encoder and the decoder. We investigate the scenario where models are trained on an in-domain traini...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Netherlands
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6579783/ https://www.ncbi.nlm.nih.gov/pubmed/31281206 http://dx.doi.org/10.1007/s10590-019-09226-9 |
_version_ | 1783427901339729920 |
---|---|
author | Calixto, Iacer Liu, Qun |
author_facet | Calixto, Iacer Liu, Qun |
author_sort | Calixto, Iacer |
collection | PubMed |
description | In this article, we conduct an extensive quantitative error analysis of different multi-modal neural machine translation (MNMT) models which integrate visual features into different parts of both the encoder and the decoder. We investigate the scenario where models are trained on an in-domain training data set of parallel sentence pairs with images. We analyse two different types of MNMT models, that use global and local image features: the latter encode an image globally, i.e. there is one feature vector representing an entire image, whereas the former encode spatial information, i.e. there are multiple feature vectors, each encoding different portions of the image. We conduct an error analysis of translations generated by different MNMT models as well as text-only baselines, where we study how multi-modal models compare when translating both visual and non-visual terms. In general, we find that the additional multi-modal signals consistently improve translations, even more so when using simpler MNMT models that use global visual features. We also find that not only translations of terms with a strong visual connotation are improved, but almost all kinds of errors decreased when using multi-modal models. |
format | Online Article Text |
id | pubmed-6579783 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Springer Netherlands |
record_format | MEDLINE/PubMed |
spelling | pubmed-65797832019-07-03 An error analysis for image-based multi-modal neural machine translation Calixto, Iacer Liu, Qun Mach Transl Article In this article, we conduct an extensive quantitative error analysis of different multi-modal neural machine translation (MNMT) models which integrate visual features into different parts of both the encoder and the decoder. We investigate the scenario where models are trained on an in-domain training data set of parallel sentence pairs with images. We analyse two different types of MNMT models, that use global and local image features: the latter encode an image globally, i.e. there is one feature vector representing an entire image, whereas the former encode spatial information, i.e. there are multiple feature vectors, each encoding different portions of the image. We conduct an error analysis of translations generated by different MNMT models as well as text-only baselines, where we study how multi-modal models compare when translating both visual and non-visual terms. In general, we find that the additional multi-modal signals consistently improve translations, even more so when using simpler MNMT models that use global visual features. We also find that not only translations of terms with a strong visual connotation are improved, but almost all kinds of errors decreased when using multi-modal models. Springer Netherlands 2019-04-08 2019 /pmc/articles/PMC6579783/ /pubmed/31281206 http://dx.doi.org/10.1007/s10590-019-09226-9 Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. |
spellingShingle | Article Calixto, Iacer Liu, Qun An error analysis for image-based multi-modal neural machine translation |
title | An error analysis for image-based multi-modal neural machine translation |
title_full | An error analysis for image-based multi-modal neural machine translation |
title_fullStr | An error analysis for image-based multi-modal neural machine translation |
title_full_unstemmed | An error analysis for image-based multi-modal neural machine translation |
title_short | An error analysis for image-based multi-modal neural machine translation |
title_sort | error analysis for image-based multi-modal neural machine translation |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6579783/ https://www.ncbi.nlm.nih.gov/pubmed/31281206 http://dx.doi.org/10.1007/s10590-019-09226-9 |
work_keys_str_mv | AT calixtoiacer anerroranalysisforimagebasedmultimodalneuralmachinetranslation AT liuqun anerroranalysisforimagebasedmultimodalneuralmachinetranslation AT calixtoiacer erroranalysisforimagebasedmultimodalneuralmachinetranslation AT liuqun erroranalysisforimagebasedmultimodalneuralmachinetranslation |