Cargando…

Read, spot and translate

We propose multimodal machine translation (MMT) approaches that exploit the correspondences between words and image regions. In contrast to existing work, our referential grounding method considers objects as the visual unit for grounding, rather than whole images or abstract image regions, and perf...

Descripción completa

Detalles Bibliográficos
Autores principales: Specia, Lucia, Wang, Josiah, Lee, Sun Jae, Ostapenko, Alissa, Madhyastha, Pranava
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Netherlands 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8550676/
https://www.ncbi.nlm.nih.gov/pubmed/34776635
http://dx.doi.org/10.1007/s10590-021-09259-z
_version_ 1784591006028529664
author Specia, Lucia
Wang, Josiah
Lee, Sun Jae
Ostapenko, Alissa
Madhyastha, Pranava
author_facet Specia, Lucia
Wang, Josiah
Lee, Sun Jae
Ostapenko, Alissa
Madhyastha, Pranava
author_sort Specia, Lucia
collection PubMed
description We propose multimodal machine translation (MMT) approaches that exploit the correspondences between words and image regions. In contrast to existing work, our referential grounding method considers objects as the visual unit for grounding, rather than whole images or abstract image regions, and performs visual grounding in the source language, rather than at the decoding stage via attention. We explore two referential grounding approaches: (i) implicit grounding, where the model jointly learns how to ground the source language in the visual representation and to translate; and (ii) explicit grounding, where grounding is performed independent of the translation model, and is subsequently used to guide machine translation. We performed experiments on the Multi30K dataset for three language pairs: English–German, English–French and English–Czech. Our referential grounding models outperform existing MMT models according to automatic and human evaluation metrics.
format Online
Article
Text
id pubmed-8550676
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer Netherlands
record_format MEDLINE/PubMed
spelling pubmed-85506762021-11-10 Read, spot and translate Specia, Lucia Wang, Josiah Lee, Sun Jae Ostapenko, Alissa Madhyastha, Pranava Mach Transl Article We propose multimodal machine translation (MMT) approaches that exploit the correspondences between words and image regions. In contrast to existing work, our referential grounding method considers objects as the visual unit for grounding, rather than whole images or abstract image regions, and performs visual grounding in the source language, rather than at the decoding stage via attention. We explore two referential grounding approaches: (i) implicit grounding, where the model jointly learns how to ground the source language in the visual representation and to translate; and (ii) explicit grounding, where grounding is performed independent of the translation model, and is subsequently used to guide machine translation. We performed experiments on the Multi30K dataset for three language pairs: English–German, English–French and English–Czech. Our referential grounding models outperform existing MMT models according to automatic and human evaluation metrics. Springer Netherlands 2021-04-04 2021 /pmc/articles/PMC8550676/ /pubmed/34776635 http://dx.doi.org/10.1007/s10590-021-09259-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Specia, Lucia
Wang, Josiah
Lee, Sun Jae
Ostapenko, Alissa
Madhyastha, Pranava
Read, spot and translate
title Read, spot and translate
title_full Read, spot and translate
title_fullStr Read, spot and translate
title_full_unstemmed Read, spot and translate
title_short Read, spot and translate
title_sort read, spot and translate
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8550676/
https://www.ncbi.nlm.nih.gov/pubmed/34776635
http://dx.doi.org/10.1007/s10590-021-09259-z
work_keys_str_mv AT specialucia readspotandtranslate
AT wangjosiah readspotandtranslate
AT leesunjae readspotandtranslate
AT ostapenkoalissa readspotandtranslate
AT madhyasthapranava readspotandtranslate