Cargando…
Improving neural machine translation with POS-tag features for low-resource language pairs
Integrating linguistic features has been widely utilized in statistical machine translation (SMT) systems, resulting in improved translation quality. However, for low-resource languages such as Thai and Myanmar, the integration of linguistic features in neural machine translation (NMT) systems has y...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9404341/ https://www.ncbi.nlm.nih.gov/pubmed/36033261 http://dx.doi.org/10.1016/j.heliyon.2022.e10375 |
_version_ | 1784773616712286208 |
---|---|
author | Hlaing, Zar Zar Thu, Ye Kyaw Supnithi, Thepchai Netisopakul, Ponrudee |
author_facet | Hlaing, Zar Zar Thu, Ye Kyaw Supnithi, Thepchai Netisopakul, Ponrudee |
author_sort | Hlaing, Zar Zar |
collection | PubMed |
description | Integrating linguistic features has been widely utilized in statistical machine translation (SMT) systems, resulting in improved translation quality. However, for low-resource languages such as Thai and Myanmar, the integration of linguistic features in neural machine translation (NMT) systems has yet to be implemented. In this study, we propose transformer-based NMT models (transformer, multi-source transformer, and shared-multi-source transformer models) using linguistic features for two-way translation of Thai-to-Myanmar, Myanmar-to-English, and Thai-to-English. Linguistic features such as part-of-speech (POS) tags or universal part-of-speech (UPOS) tags are added to each word on either the source or target side, or both the source and target sides, and the proposed models are conducted. The multi-source transformer and shared-multi-source transformer models take two inputs (i.e., string data and string data with POS tags) and produce string data or string data with POS tags. A transformer model that utilizes only word vectors was used as the first baseline model for comparison with the proposed models. The second baseline model, an Edit-Based Transformer with Repositioning (EDITOR) model, was also used to compare with our proposed models in addition to the baseline transformer model. The findings of the experiments show that adding linguistic features to the transformer-based models enhances the performance of a neural machine translation in low-resource language pairs. Moreover, the best translation results were yielded using shared-multi-source transformer models with linguistic features resulting in more significant Bilingual Evaluation Understudy (BLEU) scores and character n-gram F-score (chrF) scores than the baseline transformer and EDITOR models. |
format | Online Article Text |
id | pubmed-9404341 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-94043412022-08-26 Improving neural machine translation with POS-tag features for low-resource language pairs Hlaing, Zar Zar Thu, Ye Kyaw Supnithi, Thepchai Netisopakul, Ponrudee Heliyon Research Article Integrating linguistic features has been widely utilized in statistical machine translation (SMT) systems, resulting in improved translation quality. However, for low-resource languages such as Thai and Myanmar, the integration of linguistic features in neural machine translation (NMT) systems has yet to be implemented. In this study, we propose transformer-based NMT models (transformer, multi-source transformer, and shared-multi-source transformer models) using linguistic features for two-way translation of Thai-to-Myanmar, Myanmar-to-English, and Thai-to-English. Linguistic features such as part-of-speech (POS) tags or universal part-of-speech (UPOS) tags are added to each word on either the source or target side, or both the source and target sides, and the proposed models are conducted. The multi-source transformer and shared-multi-source transformer models take two inputs (i.e., string data and string data with POS tags) and produce string data or string data with POS tags. A transformer model that utilizes only word vectors was used as the first baseline model for comparison with the proposed models. The second baseline model, an Edit-Based Transformer with Repositioning (EDITOR) model, was also used to compare with our proposed models in addition to the baseline transformer model. The findings of the experiments show that adding linguistic features to the transformer-based models enhances the performance of a neural machine translation in low-resource language pairs. Moreover, the best translation results were yielded using shared-multi-source transformer models with linguistic features resulting in more significant Bilingual Evaluation Understudy (BLEU) scores and character n-gram F-score (chrF) scores than the baseline transformer and EDITOR models. Elsevier 2022-08-22 /pmc/articles/PMC9404341/ /pubmed/36033261 http://dx.doi.org/10.1016/j.heliyon.2022.e10375 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Research Article Hlaing, Zar Zar Thu, Ye Kyaw Supnithi, Thepchai Netisopakul, Ponrudee Improving neural machine translation with POS-tag features for low-resource language pairs |
title | Improving neural machine translation with POS-tag features for low-resource language pairs |
title_full | Improving neural machine translation with POS-tag features for low-resource language pairs |
title_fullStr | Improving neural machine translation with POS-tag features for low-resource language pairs |
title_full_unstemmed | Improving neural machine translation with POS-tag features for low-resource language pairs |
title_short | Improving neural machine translation with POS-tag features for low-resource language pairs |
title_sort | improving neural machine translation with pos-tag features for low-resource language pairs |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9404341/ https://www.ncbi.nlm.nih.gov/pubmed/36033261 http://dx.doi.org/10.1016/j.heliyon.2022.e10375 |
work_keys_str_mv | AT hlaingzarzar improvingneuralmachinetranslationwithpostagfeaturesforlowresourcelanguagepairs AT thuyekyaw improvingneuralmachinetranslationwithpostagfeaturesforlowresourcelanguagepairs AT supnithithepchai improvingneuralmachinetranslationwithpostagfeaturesforlowresourcelanguagepairs AT netisopakulponrudee improvingneuralmachinetranslationwithpostagfeaturesforlowresourcelanguagepairs |