Cargando…

Sequence-to-Sequence Models for Automated Text Simplification

A key writing skill is the capability to clearly convey desired meaning using available linguistic knowledge. Consequently, writers must select from a large array of idioms, vocabulary terms that are semantically equivalent, and discourse features that simultaneously reflect content and allow reader...

Descripción completa

Detalles Bibliográficos
Autores principales: Botarleanu, Robert-Mihai, Dascalu, Mihai, Crossley, Scott Andrew, McNamara, Danielle S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7334681/
http://dx.doi.org/10.1007/978-3-030-52240-7_6
_version_ 1783553978501431296
author Botarleanu, Robert-Mihai
Dascalu, Mihai
Crossley, Scott Andrew
McNamara, Danielle S.
author_facet Botarleanu, Robert-Mihai
Dascalu, Mihai
Crossley, Scott Andrew
McNamara, Danielle S.
author_sort Botarleanu, Robert-Mihai
collection PubMed
description A key writing skill is the capability to clearly convey desired meaning using available linguistic knowledge. Consequently, writers must select from a large array of idioms, vocabulary terms that are semantically equivalent, and discourse features that simultaneously reflect content and allow readers to grasp meaning. In many cases, a simplified version of a text is needed to ensure comprehension on the part of a targeted audience (e.g., second language learners). To address this need, we propose an automated method to simplify texts based on paraphrasing. Specifically, we explore the potential for a deep learning model, previously used for machine translation, to learn a simplified version of the English language within the context of short phrases. The best model, based on an Universal Transformer architecture, achieved a BLEU score of 66.01. We also evaluated this model’s capability to perform similar transformation to texts that were simplified by human experts at different levels.
format Online
Article
Text
id pubmed-7334681
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-73346812020-07-06 Sequence-to-Sequence Models for Automated Text Simplification Botarleanu, Robert-Mihai Dascalu, Mihai Crossley, Scott Andrew McNamara, Danielle S. Artificial Intelligence in Education Article A key writing skill is the capability to clearly convey desired meaning using available linguistic knowledge. Consequently, writers must select from a large array of idioms, vocabulary terms that are semantically equivalent, and discourse features that simultaneously reflect content and allow readers to grasp meaning. In many cases, a simplified version of a text is needed to ensure comprehension on the part of a targeted audience (e.g., second language learners). To address this need, we propose an automated method to simplify texts based on paraphrasing. Specifically, we explore the potential for a deep learning model, previously used for machine translation, to learn a simplified version of the English language within the context of short phrases. The best model, based on an Universal Transformer architecture, achieved a BLEU score of 66.01. We also evaluated this model’s capability to perform similar transformation to texts that were simplified by human experts at different levels. 2020-06-10 /pmc/articles/PMC7334681/ http://dx.doi.org/10.1007/978-3-030-52240-7_6 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Botarleanu, Robert-Mihai
Dascalu, Mihai
Crossley, Scott Andrew
McNamara, Danielle S.
Sequence-to-Sequence Models for Automated Text Simplification
title Sequence-to-Sequence Models for Automated Text Simplification
title_full Sequence-to-Sequence Models for Automated Text Simplification
title_fullStr Sequence-to-Sequence Models for Automated Text Simplification
title_full_unstemmed Sequence-to-Sequence Models for Automated Text Simplification
title_short Sequence-to-Sequence Models for Automated Text Simplification
title_sort sequence-to-sequence models for automated text simplification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7334681/
http://dx.doi.org/10.1007/978-3-030-52240-7_6
work_keys_str_mv AT botarleanurobertmihai sequencetosequencemodelsforautomatedtextsimplification
AT dascalumihai sequencetosequencemodelsforautomatedtextsimplification
AT crossleyscottandrew sequencetosequencemodelsforautomatedtextsimplification
AT mcnamaradanielles sequencetosequencemodelsforautomatedtextsimplification