Cargando…

Extracting Parallel Sentences from Nonparallel Corpora Using Parallel Hierarchical Attention Network

Collecting parallel sentences from nonparallel data is a long-standing natural language processing research problem. In particular, parallel training sentences are very important for the quality of machine translation systems. While many existing methods have shown encouraging results, they cannot l...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhu, Shaolin, Yang, Yong, Xu, Chun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7482026/
https://www.ncbi.nlm.nih.gov/pubmed/32952544
http://dx.doi.org/10.1155/2020/8823906
_version_ 1783580727593402368
author Zhu, Shaolin
Yang, Yong
Xu, Chun
author_facet Zhu, Shaolin
Yang, Yong
Xu, Chun
author_sort Zhu, Shaolin
collection PubMed
description Collecting parallel sentences from nonparallel data is a long-standing natural language processing research problem. In particular, parallel training sentences are very important for the quality of machine translation systems. While many existing methods have shown encouraging results, they cannot learn various alignment weights in parallel sentences. To address this issue, we propose a novel parallel hierarchical attention neural network which encodes monolingual sentences versus bilingual sentences and construct a classifier to extract parallel sentences. In particular, our attention mechanism structure can learn different alignment weights of words in parallel sentences. Experimental results show that our model can obtain state-of-the-art performance on the English-French, English-German, and English-Chinese dataset of BUCC 2017 shared task about parallel sentences' extraction.
format Online
Article
Text
id pubmed-7482026
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-74820262020-09-18 Extracting Parallel Sentences from Nonparallel Corpora Using Parallel Hierarchical Attention Network Zhu, Shaolin Yang, Yong Xu, Chun Comput Intell Neurosci Review Article Collecting parallel sentences from nonparallel data is a long-standing natural language processing research problem. In particular, parallel training sentences are very important for the quality of machine translation systems. While many existing methods have shown encouraging results, they cannot learn various alignment weights in parallel sentences. To address this issue, we propose a novel parallel hierarchical attention neural network which encodes monolingual sentences versus bilingual sentences and construct a classifier to extract parallel sentences. In particular, our attention mechanism structure can learn different alignment weights of words in parallel sentences. Experimental results show that our model can obtain state-of-the-art performance on the English-French, English-German, and English-Chinese dataset of BUCC 2017 shared task about parallel sentences' extraction. Hindawi 2020-09-01 /pmc/articles/PMC7482026/ /pubmed/32952544 http://dx.doi.org/10.1155/2020/8823906 Text en Copyright © 2020 Shaolin Zhu et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Review Article
Zhu, Shaolin
Yang, Yong
Xu, Chun
Extracting Parallel Sentences from Nonparallel Corpora Using Parallel Hierarchical Attention Network
title Extracting Parallel Sentences from Nonparallel Corpora Using Parallel Hierarchical Attention Network
title_full Extracting Parallel Sentences from Nonparallel Corpora Using Parallel Hierarchical Attention Network
title_fullStr Extracting Parallel Sentences from Nonparallel Corpora Using Parallel Hierarchical Attention Network
title_full_unstemmed Extracting Parallel Sentences from Nonparallel Corpora Using Parallel Hierarchical Attention Network
title_short Extracting Parallel Sentences from Nonparallel Corpora Using Parallel Hierarchical Attention Network
title_sort extracting parallel sentences from nonparallel corpora using parallel hierarchical attention network
topic Review Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7482026/
https://www.ncbi.nlm.nih.gov/pubmed/32952544
http://dx.doi.org/10.1155/2020/8823906
work_keys_str_mv AT zhushaolin extractingparallelsentencesfromnonparallelcorporausingparallelhierarchicalattentionnetwork
AT yangyong extractingparallelsentencesfromnonparallelcorporausingparallelhierarchicalattentionnetwork
AT xuchun extractingparallelsentencesfromnonparallelcorporausingparallelhierarchicalattentionnetwork