Cargando…

Improving neural machine translation with POS-tag features for low-resource language pairs

Integrating linguistic features has been widely utilized in statistical machine translation (SMT) systems, resulting in improved translation quality. However, for low-resource languages such as Thai and Myanmar, the integration of linguistic features in neural machine translation (NMT) systems has y...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hlaing, Zar Zar, Thu, Ye Kyaw, Supnithi, Thepchai, Netisopakul, Ponrudee
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9404341/ https://www.ncbi.nlm.nih.gov/pubmed/36033261 http://dx.doi.org/10.1016/j.heliyon.2022.e10375

_version_	1784773616712286208
author	Hlaing, Zar Zar Thu, Ye Kyaw Supnithi, Thepchai Netisopakul, Ponrudee
author_facet	Hlaing, Zar Zar Thu, Ye Kyaw Supnithi, Thepchai Netisopakul, Ponrudee
author_sort	Hlaing, Zar Zar
collection	PubMed
description	Integrating linguistic features has been widely utilized in statistical machine translation (SMT) systems, resulting in improved translation quality. However, for low-resource languages such as Thai and Myanmar, the integration of linguistic features in neural machine translation (NMT) systems has yet to be implemented. In this study, we propose transformer-based NMT models (transformer, multi-source transformer, and shared-multi-source transformer models) using linguistic features for two-way translation of Thai-to-Myanmar, Myanmar-to-English, and Thai-to-English. Linguistic features such as part-of-speech (POS) tags or universal part-of-speech (UPOS) tags are added to each word on either the source or target side, or both the source and target sides, and the proposed models are conducted. The multi-source transformer and shared-multi-source transformer models take two inputs (i.e., string data and string data with POS tags) and produce string data or string data with POS tags. A transformer model that utilizes only word vectors was used as the first baseline model for comparison with the proposed models. The second baseline model, an Edit-Based Transformer with Repositioning (EDITOR) model, was also used to compare with our proposed models in addition to the baseline transformer model. The findings of the experiments show that adding linguistic features to the transformer-based models enhances the performance of a neural machine translation in low-resource language pairs. Moreover, the best translation results were yielded using shared-multi-source transformer models with linguistic features resulting in more significant Bilingual Evaluation Understudy (BLEU) scores and character n-gram F-score (chrF) scores than the baseline transformer and EDITOR models.
format	Online Article Text
id	pubmed-9404341
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-94043412022-08-26 Improving neural machine translation with POS-tag features for low-resource language pairs Hlaing, Zar Zar Thu, Ye Kyaw Supnithi, Thepchai Netisopakul, Ponrudee Heliyon Research Article Integrating linguistic features has been widely utilized in statistical machine translation (SMT) systems, resulting in improved translation quality. However, for low-resource languages such as Thai and Myanmar, the integration of linguistic features in neural machine translation (NMT) systems has yet to be implemented. In this study, we propose transformer-based NMT models (transformer, multi-source transformer, and shared-multi-source transformer models) using linguistic features for two-way translation of Thai-to-Myanmar, Myanmar-to-English, and Thai-to-English. Linguistic features such as part-of-speech (POS) tags or universal part-of-speech (UPOS) tags are added to each word on either the source or target side, or both the source and target sides, and the proposed models are conducted. The multi-source transformer and shared-multi-source transformer models take two inputs (i.e., string data and string data with POS tags) and produce string data or string data with POS tags. A transformer model that utilizes only word vectors was used as the first baseline model for comparison with the proposed models. The second baseline model, an Edit-Based Transformer with Repositioning (EDITOR) model, was also used to compare with our proposed models in addition to the baseline transformer model. The findings of the experiments show that adding linguistic features to the transformer-based models enhances the performance of a neural machine translation in low-resource language pairs. Moreover, the best translation results were yielded using shared-multi-source transformer models with linguistic features resulting in more significant Bilingual Evaluation Understudy (BLEU) scores and character n-gram F-score (chrF) scores than the baseline transformer and EDITOR models. Elsevier 2022-08-22 /pmc/articles/PMC9404341/ /pubmed/36033261 http://dx.doi.org/10.1016/j.heliyon.2022.e10375 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Research Article Hlaing, Zar Zar Thu, Ye Kyaw Supnithi, Thepchai Netisopakul, Ponrudee Improving neural machine translation with POS-tag features for low-resource language pairs
title	Improving neural machine translation with POS-tag features for low-resource language pairs
title_full	Improving neural machine translation with POS-tag features for low-resource language pairs
title_fullStr	Improving neural machine translation with POS-tag features for low-resource language pairs
title_full_unstemmed	Improving neural machine translation with POS-tag features for low-resource language pairs
title_short	Improving neural machine translation with POS-tag features for low-resource language pairs
title_sort	improving neural machine translation with pos-tag features for low-resource language pairs
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9404341/ https://www.ncbi.nlm.nih.gov/pubmed/36033261 http://dx.doi.org/10.1016/j.heliyon.2022.e10375
work_keys_str_mv	AT hlaingzarzar improvingneuralmachinetranslationwithpostagfeaturesforlowresourcelanguagepairs AT thuyekyaw improvingneuralmachinetranslationwithpostagfeaturesforlowresourcelanguagepairs AT supnithithepchai improvingneuralmachinetranslationwithpostagfeaturesforlowresourcelanguagepairs AT netisopakulponrudee improvingneuralmachinetranslationwithpostagfeaturesforlowresourcelanguagepairs

Improving neural machine translation with POS-tag features for low-resource language pairs

Ejemplares similares