Cargando…

Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian

In this paper, we present an overview of existing parallel corpora for Automatic Text Simplification (ATS) in different languages focusing on the approach adopted for their construction. We make the main distinction between manual and (semi)–automatic approaches in order to investigate in which resp...

Descripción completa

Detalles Bibliográficos
Autores principales: Brunato, Dominique, Dell'Orletta, Felice, Venturi, Giulia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8958033/
https://www.ncbi.nlm.nih.gov/pubmed/35350726
http://dx.doi.org/10.3389/fpsyg.2022.707630
_version_ 1784676862928093184
author Brunato, Dominique
Dell'Orletta, Felice
Venturi, Giulia
author_facet Brunato, Dominique
Dell'Orletta, Felice
Venturi, Giulia
author_sort Brunato, Dominique
collection PubMed
description In this paper, we present an overview of existing parallel corpora for Automatic Text Simplification (ATS) in different languages focusing on the approach adopted for their construction. We make the main distinction between manual and (semi)–automatic approaches in order to investigate in which respect complex and simple texts vary and whether and how the observed modifications may depend on the underlying approach. To this end, we perform a two-level comparison on Italian corpora, since this is the only language, with the exception of English, for which there are large parallel resources derived through the two approaches considered. The first level of comparison accounts for the main types of sentence transformations occurring in the simplification process, the second one examines the results of a linguistic profiling analysis based on Natural Language Processing techniques and carried out on the original and the simple version of the same texts. For both levels of analysis, we chose to focus our discussion mostly on sentence transformations and linguistic characteristics that pertain to the morpho-syntactic and syntactic structure of the sentence.
format Online
Article
Text
id pubmed-8958033
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-89580332022-03-28 Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian Brunato, Dominique Dell'Orletta, Felice Venturi, Giulia Front Psychol Psychology In this paper, we present an overview of existing parallel corpora for Automatic Text Simplification (ATS) in different languages focusing on the approach adopted for their construction. We make the main distinction between manual and (semi)–automatic approaches in order to investigate in which respect complex and simple texts vary and whether and how the observed modifications may depend on the underlying approach. To this end, we perform a two-level comparison on Italian corpora, since this is the only language, with the exception of English, for which there are large parallel resources derived through the two approaches considered. The first level of comparison accounts for the main types of sentence transformations occurring in the simplification process, the second one examines the results of a linguistic profiling analysis based on Natural Language Processing techniques and carried out on the original and the simple version of the same texts. For both levels of analysis, we chose to focus our discussion mostly on sentence transformations and linguistic characteristics that pertain to the morpho-syntactic and syntactic structure of the sentence. Frontiers Media S.A. 2022-03-08 /pmc/articles/PMC8958033/ /pubmed/35350726 http://dx.doi.org/10.3389/fpsyg.2022.707630 Text en Copyright © 2022 Brunato, Dell'Orletta and Venturi. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Psychology
Brunato, Dominique
Dell'Orletta, Felice
Venturi, Giulia
Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian
title Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian
title_full Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian
title_fullStr Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian
title_full_unstemmed Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian
title_short Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on Italian
title_sort linguistically-based comparison of different approaches to building corpora for text simplification: a case study on italian
topic Psychology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8958033/
https://www.ncbi.nlm.nih.gov/pubmed/35350726
http://dx.doi.org/10.3389/fpsyg.2022.707630
work_keys_str_mv AT brunatodominique linguisticallybasedcomparisonofdifferentapproachestobuildingcorporafortextsimplificationacasestudyonitalian
AT dellorlettafelice linguisticallybasedcomparisonofdifferentapproachestobuildingcorporafortextsimplificationacasestudyonitalian
AT venturigiulia linguisticallybasedcomparisonofdifferentapproachestobuildingcorporafortextsimplificationacasestudyonitalian