Cargando…

A Survey of Orthographic Information in Machine Translation

Machine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related languages. A widespread and underlying problem for t...

Descripción completa

Detalles Bibliográficos
Autores principales: Chakravarthi, Bharathi Raja, Rani, Priya, Arcan, Mihael, McCrae, John P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Singapore 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8550410/
https://www.ncbi.nlm.nih.gov/pubmed/34723204
http://dx.doi.org/10.1007/s42979-021-00723-4
_version_ 1784590954025451520
author Chakravarthi, Bharathi Raja
Rani, Priya
Arcan, Mihael
McCrae, John P.
author_facet Chakravarthi, Bharathi Raja
Rani, Priya
Arcan, Mihael
McCrae, John P.
author_sort Chakravarthi, Bharathi Raja
collection PubMed
description Machine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related languages. A widespread and underlying problem for these machine translation systems is the linguistic difference and variation in orthographic conventions which causes many issues to traditional approaches. Two languages written in two different orthographies are not easily comparable but orthographic information can also be used to improve the machine translation system. This article offers a survey of research regarding orthography’s influence on machine translation of under-resourced languages. It introduces under-resourced languages in terms of machine translation and how orthographic information can be utilised to improve machine translation. We describe previous work in this area, discussing what underlying assumptions were made, and showing how orthographic knowledge improves the performance of machine translation of under-resourced languages. We discuss different types of machine translation and demonstrate a recent trend that seeks to link orthographic information with well-established machine translation methods. Considerable attention is given to current efforts using cognate information at different levels of machine translation and the lessons that can be drawn from this. Additionally, multilingual neural machine translation of closely related languages is given a particular focus in this survey. This article ends with a discussion of the way forward in machine translation with orthographic information, focusing on multilingual settings and bilingual lexicon induction.
format Online
Article
Text
id pubmed-8550410
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer Singapore
record_format MEDLINE/PubMed
spelling pubmed-85504102021-10-29 A Survey of Orthographic Information in Machine Translation Chakravarthi, Bharathi Raja Rani, Priya Arcan, Mihael McCrae, John P. SN Comput Sci Survey Article Machine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related languages. A widespread and underlying problem for these machine translation systems is the linguistic difference and variation in orthographic conventions which causes many issues to traditional approaches. Two languages written in two different orthographies are not easily comparable but orthographic information can also be used to improve the machine translation system. This article offers a survey of research regarding orthography’s influence on machine translation of under-resourced languages. It introduces under-resourced languages in terms of machine translation and how orthographic information can be utilised to improve machine translation. We describe previous work in this area, discussing what underlying assumptions were made, and showing how orthographic knowledge improves the performance of machine translation of under-resourced languages. We discuss different types of machine translation and demonstrate a recent trend that seeks to link orthographic information with well-established machine translation methods. Considerable attention is given to current efforts using cognate information at different levels of machine translation and the lessons that can be drawn from this. Additionally, multilingual neural machine translation of closely related languages is given a particular focus in this survey. This article ends with a discussion of the way forward in machine translation with orthographic information, focusing on multilingual settings and bilingual lexicon induction. Springer Singapore 2021-06-07 2021 /pmc/articles/PMC8550410/ /pubmed/34723204 http://dx.doi.org/10.1007/s42979-021-00723-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Survey Article
Chakravarthi, Bharathi Raja
Rani, Priya
Arcan, Mihael
McCrae, John P.
A Survey of Orthographic Information in Machine Translation
title A Survey of Orthographic Information in Machine Translation
title_full A Survey of Orthographic Information in Machine Translation
title_fullStr A Survey of Orthographic Information in Machine Translation
title_full_unstemmed A Survey of Orthographic Information in Machine Translation
title_short A Survey of Orthographic Information in Machine Translation
title_sort survey of orthographic information in machine translation
topic Survey Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8550410/
https://www.ncbi.nlm.nih.gov/pubmed/34723204
http://dx.doi.org/10.1007/s42979-021-00723-4
work_keys_str_mv AT chakravarthibharathiraja asurveyoforthographicinformationinmachinetranslation
AT ranipriya asurveyoforthographicinformationinmachinetranslation
AT arcanmihael asurveyoforthographicinformationinmachinetranslation
AT mccraejohnp asurveyoforthographicinformationinmachinetranslation
AT chakravarthibharathiraja surveyoforthographicinformationinmachinetranslation
AT ranipriya surveyoforthographicinformationinmachinetranslation
AT arcanmihael surveyoforthographicinformationinmachinetranslation
AT mccraejohnp surveyoforthographicinformationinmachinetranslation