Cargando…

Identifying communicative functions in discourse with content types

Texts are not monolithic entities but rather coherent collections of micro illocutionary acts which help to convey a unitary message of content and purpose. Identifying such text segments is challenging because they require a fine-grained level of analysis even within a single sentence. At the same...

Descripción completa

Detalles Bibliográficos
Autores principales: Caselli, Tommaso, Sprugnoli, Rachele, Moretti, Giovanni
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Netherlands 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8335719/
https://www.ncbi.nlm.nih.gov/pubmed/34366751
http://dx.doi.org/10.1007/s10579-021-09554-4
_version_ 1783733178716913664
author Caselli, Tommaso
Sprugnoli, Rachele
Moretti, Giovanni
author_facet Caselli, Tommaso
Sprugnoli, Rachele
Moretti, Giovanni
author_sort Caselli, Tommaso
collection PubMed
description Texts are not monolithic entities but rather coherent collections of micro illocutionary acts which help to convey a unitary message of content and purpose. Identifying such text segments is challenging because they require a fine-grained level of analysis even within a single sentence. At the same time, accessing them facilitates the analysis of the communicative functions of a text as well as the identification of relevant information. We propose an empirical framework for modelling micro illocutionary acts at clause level, that we call content types, grounded on linguistic theories of text types, in particular on the framework proposed by Werlich in 1976. We make available a newly annotated corpus of 279 documents (for a total of more than 180,000 tokens) belonging to different genres and temporal periods, based on a dedicated annotation scheme. We obtain an average Cohen’s kappa of 0.89 at token level. We achieve an average F1 score of 74.99% on the automatic classification of content types using a bi-LSTM model. Similar results are obtained on contemporary and historical documents, while performances on genres are more varied. This work promotes a discourse-oriented approach to information extraction and cross-fertilisation across disciplines through a computationally-aided linguistic analysis.
format Online
Article
Text
id pubmed-8335719
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer Netherlands
record_format MEDLINE/PubMed
spelling pubmed-83357192021-08-04 Identifying communicative functions in discourse with content types Caselli, Tommaso Sprugnoli, Rachele Moretti, Giovanni Lang Resour Eval Original Paper Texts are not monolithic entities but rather coherent collections of micro illocutionary acts which help to convey a unitary message of content and purpose. Identifying such text segments is challenging because they require a fine-grained level of analysis even within a single sentence. At the same time, accessing them facilitates the analysis of the communicative functions of a text as well as the identification of relevant information. We propose an empirical framework for modelling micro illocutionary acts at clause level, that we call content types, grounded on linguistic theories of text types, in particular on the framework proposed by Werlich in 1976. We make available a newly annotated corpus of 279 documents (for a total of more than 180,000 tokens) belonging to different genres and temporal periods, based on a dedicated annotation scheme. We obtain an average Cohen’s kappa of 0.89 at token level. We achieve an average F1 score of 74.99% on the automatic classification of content types using a bi-LSTM model. Similar results are obtained on contemporary and historical documents, while performances on genres are more varied. This work promotes a discourse-oriented approach to information extraction and cross-fertilisation across disciplines through a computationally-aided linguistic analysis. Springer Netherlands 2021-08-04 2022 /pmc/articles/PMC8335719/ /pubmed/34366751 http://dx.doi.org/10.1007/s10579-021-09554-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Original Paper
Caselli, Tommaso
Sprugnoli, Rachele
Moretti, Giovanni
Identifying communicative functions in discourse with content types
title Identifying communicative functions in discourse with content types
title_full Identifying communicative functions in discourse with content types
title_fullStr Identifying communicative functions in discourse with content types
title_full_unstemmed Identifying communicative functions in discourse with content types
title_short Identifying communicative functions in discourse with content types
title_sort identifying communicative functions in discourse with content types
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8335719/
https://www.ncbi.nlm.nih.gov/pubmed/34366751
http://dx.doi.org/10.1007/s10579-021-09554-4
work_keys_str_mv AT casellitommaso identifyingcommunicativefunctionsindiscoursewithcontenttypes
AT sprugnolirachele identifyingcommunicativefunctionsindiscoursewithcontenttypes
AT morettigiovanni identifyingcommunicativefunctionsindiscoursewithcontenttypes