Cargando…

The Entropy of Digital Texts—The Mathematical Background of Correctness

Based on Shannon’s communication theory, in the present paper, we provide the theoretical background to finding an objective measurement—the text-entropy—that can describe the quality of digital natural language documents handled with word processors. The text-entropy can be calculated from the form...

Descripción completa

Detalles Bibliográficos
Autores principales: Csernoch, Mária, Nagy, Keve, Nagy, Tímea
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955509/
https://www.ncbi.nlm.nih.gov/pubmed/36832668
http://dx.doi.org/10.3390/e25020302
_version_ 1784894363846836224
author Csernoch, Mária
Nagy, Keve
Nagy, Tímea
author_facet Csernoch, Mária
Nagy, Keve
Nagy, Tímea
author_sort Csernoch, Mária
collection PubMed
description Based on Shannon’s communication theory, in the present paper, we provide the theoretical background to finding an objective measurement—the text-entropy—that can describe the quality of digital natural language documents handled with word processors. The text-entropy can be calculated from the formatting, correction, and modification entropy, and based on these values, we are able to tell how correct or how erroneous digital text-based documents are. To present how the theory can be applied to real-world texts, for the present study, three erroneous MS Word documents were selected. With these examples, we can demonstrate how to build their correcting, formatting, and modification algorithms, to calculate the time spent on modification and the entropy of the completed tasks, in both the original erroneous and the corrected documents. In general, it was found that using and modifying properly edited and formatted digital texts requires less or an equal number of knowledge-items. In information theory, it means that less data must be put on the communication channel than in the case of erroneous documents. The analysis also revealed that in the corrected documents not only the quantity of the data is less, but the quality of the data (knowledge pieces) is higher. As the consequence of these two findings, it is proven that the modification time of erroneous documents is severalfold of the correct ones, even in the case of minimal first level actions. It is also proven that to avoid the repetition of the time- and resource-consuming actions, we must correct the documents before their modification.
format Online
Article
Text
id pubmed-9955509
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-99555092023-02-25 The Entropy of Digital Texts—The Mathematical Background of Correctness Csernoch, Mária Nagy, Keve Nagy, Tímea Entropy (Basel) Article Based on Shannon’s communication theory, in the present paper, we provide the theoretical background to finding an objective measurement—the text-entropy—that can describe the quality of digital natural language documents handled with word processors. The text-entropy can be calculated from the formatting, correction, and modification entropy, and based on these values, we are able to tell how correct or how erroneous digital text-based documents are. To present how the theory can be applied to real-world texts, for the present study, three erroneous MS Word documents were selected. With these examples, we can demonstrate how to build their correcting, formatting, and modification algorithms, to calculate the time spent on modification and the entropy of the completed tasks, in both the original erroneous and the corrected documents. In general, it was found that using and modifying properly edited and formatted digital texts requires less or an equal number of knowledge-items. In information theory, it means that less data must be put on the communication channel than in the case of erroneous documents. The analysis also revealed that in the corrected documents not only the quantity of the data is less, but the quality of the data (knowledge pieces) is higher. As the consequence of these two findings, it is proven that the modification time of erroneous documents is severalfold of the correct ones, even in the case of minimal first level actions. It is also proven that to avoid the repetition of the time- and resource-consuming actions, we must correct the documents before their modification. MDPI 2023-02-06 /pmc/articles/PMC9955509/ /pubmed/36832668 http://dx.doi.org/10.3390/e25020302 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Csernoch, Mária
Nagy, Keve
Nagy, Tímea
The Entropy of Digital Texts—The Mathematical Background of Correctness
title The Entropy of Digital Texts—The Mathematical Background of Correctness
title_full The Entropy of Digital Texts—The Mathematical Background of Correctness
title_fullStr The Entropy of Digital Texts—The Mathematical Background of Correctness
title_full_unstemmed The Entropy of Digital Texts—The Mathematical Background of Correctness
title_short The Entropy of Digital Texts—The Mathematical Background of Correctness
title_sort entropy of digital texts—the mathematical background of correctness
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955509/
https://www.ncbi.nlm.nih.gov/pubmed/36832668
http://dx.doi.org/10.3390/e25020302
work_keys_str_mv AT csernochmaria theentropyofdigitaltextsthemathematicalbackgroundofcorrectness
AT nagykeve theentropyofdigitaltextsthemathematicalbackgroundofcorrectness
AT nagytimea theentropyofdigitaltextsthemathematicalbackgroundofcorrectness
AT csernochmaria entropyofdigitaltextsthemathematicalbackgroundofcorrectness
AT nagykeve entropyofdigitaltextsthemathematicalbackgroundofcorrectness
AT nagytimea entropyofdigitaltextsthemathematicalbackgroundofcorrectness