Cargando…

Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision

Machine translation relies on parallel sentences, the number of which is an important factor affecting the performance of machine translation systems, especially in low-resource languages. Recent advances in learning cross-lingual word representations from nonparallel data by machine learning make a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shi, Xiayang, Yue, Ping, Liu, Xinyi, Xu, Chun, Xu, Lin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9365574/ https://www.ncbi.nlm.nih.gov/pubmed/35965766 http://dx.doi.org/10.1155/2022/5296946

Descripción
Sumario:	Machine translation relies on parallel sentences, the number of which is an important factor affecting the performance of machine translation systems, especially in low-resource languages. Recent advances in learning cross-lingual word representations from nonparallel data by machine learning make a new possibility for obtaining bilingual sentences with minimal supervision in low-resource languages. In this paper, we introduce a novel methodology to obtain parallel sentences via only a small-size bilingual seed lexicon about hundreds of entries. We first obtain bilingual semantic by establishing cross-lingual mapping in monolingual languages via a seed lexicon. Then, we construct a deep learning classifier to extract bilingual parallel sentences. We demonstrate the effectiveness of our methodology by harvesting Uyghur-Chinese parallel sentences and constructing a machine translation system. The experiments indicate that our method can obtain large and high-accuracy bilingual parallel sentences in low-resource language pairs.

Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision

Ejemplares similares