Cargando…
The Entropy of Digital Texts—The Mathematical Background of Correctness
Based on Shannon’s communication theory, in the present paper, we provide the theoretical background to finding an objective measurement—the text-entropy—that can describe the quality of digital natural language documents handled with word processors. The text-entropy can be calculated from the form...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955509/ https://www.ncbi.nlm.nih.gov/pubmed/36832668 http://dx.doi.org/10.3390/e25020302 |
_version_ | 1784894363846836224 |
---|---|
author | Csernoch, Mária Nagy, Keve Nagy, Tímea |
author_facet | Csernoch, Mária Nagy, Keve Nagy, Tímea |
author_sort | Csernoch, Mária |
collection | PubMed |
description | Based on Shannon’s communication theory, in the present paper, we provide the theoretical background to finding an objective measurement—the text-entropy—that can describe the quality of digital natural language documents handled with word processors. The text-entropy can be calculated from the formatting, correction, and modification entropy, and based on these values, we are able to tell how correct or how erroneous digital text-based documents are. To present how the theory can be applied to real-world texts, for the present study, three erroneous MS Word documents were selected. With these examples, we can demonstrate how to build their correcting, formatting, and modification algorithms, to calculate the time spent on modification and the entropy of the completed tasks, in both the original erroneous and the corrected documents. In general, it was found that using and modifying properly edited and formatted digital texts requires less or an equal number of knowledge-items. In information theory, it means that less data must be put on the communication channel than in the case of erroneous documents. The analysis also revealed that in the corrected documents not only the quantity of the data is less, but the quality of the data (knowledge pieces) is higher. As the consequence of these two findings, it is proven that the modification time of erroneous documents is severalfold of the correct ones, even in the case of minimal first level actions. It is also proven that to avoid the repetition of the time- and resource-consuming actions, we must correct the documents before their modification. |
format | Online Article Text |
id | pubmed-9955509 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-99555092023-02-25 The Entropy of Digital Texts—The Mathematical Background of Correctness Csernoch, Mária Nagy, Keve Nagy, Tímea Entropy (Basel) Article Based on Shannon’s communication theory, in the present paper, we provide the theoretical background to finding an objective measurement—the text-entropy—that can describe the quality of digital natural language documents handled with word processors. The text-entropy can be calculated from the formatting, correction, and modification entropy, and based on these values, we are able to tell how correct or how erroneous digital text-based documents are. To present how the theory can be applied to real-world texts, for the present study, three erroneous MS Word documents were selected. With these examples, we can demonstrate how to build their correcting, formatting, and modification algorithms, to calculate the time spent on modification and the entropy of the completed tasks, in both the original erroneous and the corrected documents. In general, it was found that using and modifying properly edited and formatted digital texts requires less or an equal number of knowledge-items. In information theory, it means that less data must be put on the communication channel than in the case of erroneous documents. The analysis also revealed that in the corrected documents not only the quantity of the data is less, but the quality of the data (knowledge pieces) is higher. As the consequence of these two findings, it is proven that the modification time of erroneous documents is severalfold of the correct ones, even in the case of minimal first level actions. It is also proven that to avoid the repetition of the time- and resource-consuming actions, we must correct the documents before their modification. MDPI 2023-02-06 /pmc/articles/PMC9955509/ /pubmed/36832668 http://dx.doi.org/10.3390/e25020302 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Csernoch, Mária Nagy, Keve Nagy, Tímea The Entropy of Digital Texts—The Mathematical Background of Correctness |
title | The Entropy of Digital Texts—The Mathematical Background of Correctness |
title_full | The Entropy of Digital Texts—The Mathematical Background of Correctness |
title_fullStr | The Entropy of Digital Texts—The Mathematical Background of Correctness |
title_full_unstemmed | The Entropy of Digital Texts—The Mathematical Background of Correctness |
title_short | The Entropy of Digital Texts—The Mathematical Background of Correctness |
title_sort | entropy of digital texts—the mathematical background of correctness |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9955509/ https://www.ncbi.nlm.nih.gov/pubmed/36832668 http://dx.doi.org/10.3390/e25020302 |
work_keys_str_mv | AT csernochmaria theentropyofdigitaltextsthemathematicalbackgroundofcorrectness AT nagykeve theentropyofdigitaltextsthemathematicalbackgroundofcorrectness AT nagytimea theentropyofdigitaltextsthemathematicalbackgroundofcorrectness AT csernochmaria entropyofdigitaltextsthemathematicalbackgroundofcorrectness AT nagykeve entropyofdigitaltextsthemathematicalbackgroundofcorrectness AT nagytimea entropyofdigitaltextsthemathematicalbackgroundofcorrectness |