Cargando…

Improving neural machine translation with POS-tag features for low-resource language pairs

Integrating linguistic features has been widely utilized in statistical machine translation (SMT) systems, resulting in improved translation quality. However, for low-resource languages such as Thai and Myanmar, the integration of linguistic features in neural machine translation (NMT) systems has y...

Descripción completa

Detalles Bibliográficos
Autores principales: Hlaing, Zar Zar, Thu, Ye Kyaw, Supnithi, Thepchai, Netisopakul, Ponrudee
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9404341/
https://www.ncbi.nlm.nih.gov/pubmed/36033261
http://dx.doi.org/10.1016/j.heliyon.2022.e10375
_version_ 1784773616712286208
author Hlaing, Zar Zar
Thu, Ye Kyaw
Supnithi, Thepchai
Netisopakul, Ponrudee
author_facet Hlaing, Zar Zar
Thu, Ye Kyaw
Supnithi, Thepchai
Netisopakul, Ponrudee
author_sort Hlaing, Zar Zar
collection PubMed
description Integrating linguistic features has been widely utilized in statistical machine translation (SMT) systems, resulting in improved translation quality. However, for low-resource languages such as Thai and Myanmar, the integration of linguistic features in neural machine translation (NMT) systems has yet to be implemented. In this study, we propose transformer-based NMT models (transformer, multi-source transformer, and shared-multi-source transformer models) using linguistic features for two-way translation of Thai-to-Myanmar, Myanmar-to-English, and Thai-to-English. Linguistic features such as part-of-speech (POS) tags or universal part-of-speech (UPOS) tags are added to each word on either the source or target side, or both the source and target sides, and the proposed models are conducted. The multi-source transformer and shared-multi-source transformer models take two inputs (i.e., string data and string data with POS tags) and produce string data or string data with POS tags. A transformer model that utilizes only word vectors was used as the first baseline model for comparison with the proposed models. The second baseline model, an Edit-Based Transformer with Repositioning (EDITOR) model, was also used to compare with our proposed models in addition to the baseline transformer model. The findings of the experiments show that adding linguistic features to the transformer-based models enhances the performance of a neural machine translation in low-resource language pairs. Moreover, the best translation results were yielded using shared-multi-source transformer models with linguistic features resulting in more significant Bilingual Evaluation Understudy (BLEU) scores and character n-gram F-score (chrF) scores than the baseline transformer and EDITOR models.
format Online
Article
Text
id pubmed-9404341
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-94043412022-08-26 Improving neural machine translation with POS-tag features for low-resource language pairs Hlaing, Zar Zar Thu, Ye Kyaw Supnithi, Thepchai Netisopakul, Ponrudee Heliyon Research Article Integrating linguistic features has been widely utilized in statistical machine translation (SMT) systems, resulting in improved translation quality. However, for low-resource languages such as Thai and Myanmar, the integration of linguistic features in neural machine translation (NMT) systems has yet to be implemented. In this study, we propose transformer-based NMT models (transformer, multi-source transformer, and shared-multi-source transformer models) using linguistic features for two-way translation of Thai-to-Myanmar, Myanmar-to-English, and Thai-to-English. Linguistic features such as part-of-speech (POS) tags or universal part-of-speech (UPOS) tags are added to each word on either the source or target side, or both the source and target sides, and the proposed models are conducted. The multi-source transformer and shared-multi-source transformer models take two inputs (i.e., string data and string data with POS tags) and produce string data or string data with POS tags. A transformer model that utilizes only word vectors was used as the first baseline model for comparison with the proposed models. The second baseline model, an Edit-Based Transformer with Repositioning (EDITOR) model, was also used to compare with our proposed models in addition to the baseline transformer model. The findings of the experiments show that adding linguistic features to the transformer-based models enhances the performance of a neural machine translation in low-resource language pairs. Moreover, the best translation results were yielded using shared-multi-source transformer models with linguistic features resulting in more significant Bilingual Evaluation Understudy (BLEU) scores and character n-gram F-score (chrF) scores than the baseline transformer and EDITOR models. Elsevier 2022-08-22 /pmc/articles/PMC9404341/ /pubmed/36033261 http://dx.doi.org/10.1016/j.heliyon.2022.e10375 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Hlaing, Zar Zar
Thu, Ye Kyaw
Supnithi, Thepchai
Netisopakul, Ponrudee
Improving neural machine translation with POS-tag features for low-resource language pairs
title Improving neural machine translation with POS-tag features for low-resource language pairs
title_full Improving neural machine translation with POS-tag features for low-resource language pairs
title_fullStr Improving neural machine translation with POS-tag features for low-resource language pairs
title_full_unstemmed Improving neural machine translation with POS-tag features for low-resource language pairs
title_short Improving neural machine translation with POS-tag features for low-resource language pairs
title_sort improving neural machine translation with pos-tag features for low-resource language pairs
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9404341/
https://www.ncbi.nlm.nih.gov/pubmed/36033261
http://dx.doi.org/10.1016/j.heliyon.2022.e10375
work_keys_str_mv AT hlaingzarzar improvingneuralmachinetranslationwithpostagfeaturesforlowresourcelanguagepairs
AT thuyekyaw improvingneuralmachinetranslationwithpostagfeaturesforlowresourcelanguagepairs
AT supnithithepchai improvingneuralmachinetranslationwithpostagfeaturesforlowresourcelanguagepairs
AT netisopakulponrudee improvingneuralmachinetranslationwithpostagfeaturesforlowresourcelanguagepairs