Cargando…

Creating a list of word alignments from parallel Russian simplification data

This work describes the development of a list of monolingual word alignments taken from parallel Russian simplification data. This word lists can be used in such lexical simplification tasks as rule-based simplification applications and lexically constrained decoding for neural machine translation m...

Descripción completa

Detalles Bibliográficos
Autores principales: Dmitrieva, Anna, Laposhina, Antonina, Lebedeva, Maria Yuryevna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9510348/
https://www.ncbi.nlm.nih.gov/pubmed/36171800
http://dx.doi.org/10.3389/frai.2022.984759
_version_ 1784797422448279552
author Dmitrieva, Anna
Laposhina, Antonina
Lebedeva, Maria Yuryevna
author_facet Dmitrieva, Anna
Laposhina, Antonina
Lebedeva, Maria Yuryevna
author_sort Dmitrieva, Anna
collection PubMed
description This work describes the development of a list of monolingual word alignments taken from parallel Russian simplification data. This word lists can be used in such lexical simplification tasks as rule-based simplification applications and lexically constrained decoding for neural machine translation models. Moreover, they constitute a valuable source of information for developing educational materials for teaching Russian as a second/foreign language. In this work, a word list was compiled automatically and post-edited by human experts. The resulting list contains 1409 word pairs in which each “complex” word has an equivalent “simpler” (shorter, more frequent, modern, international) synonym. We studied the contents of the word list by comparing the frequencies of the words in the pairs and their levels in the special CEFR-graded vocabulary lists for learners of Russian as a foreign language. The evaluation demonstrated that lexical simplification by means of single-word synonym replacement does not occur often in the adapted texts. The resulting list also illustrates the peculiarities of the lexical simplification task for L2 learners, such as the choice of a less frequent but international word.
format Online
Article
Text
id pubmed-9510348
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-95103482022-09-27 Creating a list of word alignments from parallel Russian simplification data Dmitrieva, Anna Laposhina, Antonina Lebedeva, Maria Yuryevna Front Artif Intell Artificial Intelligence This work describes the development of a list of monolingual word alignments taken from parallel Russian simplification data. This word lists can be used in such lexical simplification tasks as rule-based simplification applications and lexically constrained decoding for neural machine translation models. Moreover, they constitute a valuable source of information for developing educational materials for teaching Russian as a second/foreign language. In this work, a word list was compiled automatically and post-edited by human experts. The resulting list contains 1409 word pairs in which each “complex” word has an equivalent “simpler” (shorter, more frequent, modern, international) synonym. We studied the contents of the word list by comparing the frequencies of the words in the pairs and their levels in the special CEFR-graded vocabulary lists for learners of Russian as a foreign language. The evaluation demonstrated that lexical simplification by means of single-word synonym replacement does not occur often in the adapted texts. The resulting list also illustrates the peculiarities of the lexical simplification task for L2 learners, such as the choice of a less frequent but international word. Frontiers Media S.A. 2022-09-12 /pmc/articles/PMC9510348/ /pubmed/36171800 http://dx.doi.org/10.3389/frai.2022.984759 Text en Copyright © 2022 Dmitrieva, Laposhina and Lebedeva. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Artificial Intelligence
Dmitrieva, Anna
Laposhina, Antonina
Lebedeva, Maria Yuryevna
Creating a list of word alignments from parallel Russian simplification data
title Creating a list of word alignments from parallel Russian simplification data
title_full Creating a list of word alignments from parallel Russian simplification data
title_fullStr Creating a list of word alignments from parallel Russian simplification data
title_full_unstemmed Creating a list of word alignments from parallel Russian simplification data
title_short Creating a list of word alignments from parallel Russian simplification data
title_sort creating a list of word alignments from parallel russian simplification data
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9510348/
https://www.ncbi.nlm.nih.gov/pubmed/36171800
http://dx.doi.org/10.3389/frai.2022.984759
work_keys_str_mv AT dmitrievaanna creatingalistofwordalignmentsfromparallelrussiansimplificationdata
AT laposhinaantonina creatingalistofwordalignmentsfromparallelrussiansimplificationdata
AT lebedevamariayuryevna creatingalistofwordalignmentsfromparallelrussiansimplificationdata