Cargando…

Recurrence Networks in Natural Languages

We present a study of natural language using the recurrence network method. In our approach, the repetition of patterns of characters is evaluated without considering the word structure in written texts from different natural languages. Our dataset comprises 85 ebookseBooks written in 17 different E...

Descripción completa

Detalles Bibliográficos
Autores principales:	Baeza-Blancas, Edgar, Obregón-Quintana, Bibiana, Hernández-Gómez, Candelario, Gómez-Meléndez, Domingo, Aguilar-Velázquez, Daniel, Liebovitch, Larry S., Guzmán-Vargas, Lev
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7515007/ https://www.ncbi.nlm.nih.gov/pubmed/33267231 http://dx.doi.org/10.3390/e21050517

_version_	1783586719265718272
author	Baeza-Blancas, Edgar Obregón-Quintana, Bibiana Hernández-Gómez, Candelario Gómez-Meléndez, Domingo Aguilar-Velázquez, Daniel Liebovitch, Larry S. Guzmán-Vargas, Lev
author_facet	Baeza-Blancas, Edgar Obregón-Quintana, Bibiana Hernández-Gómez, Candelario Gómez-Meléndez, Domingo Aguilar-Velázquez, Daniel Liebovitch, Larry S. Guzmán-Vargas, Lev
author_sort	Baeza-Blancas, Edgar
collection	PubMed
description	We present a study of natural language using the recurrence network method. In our approach, the repetition of patterns of characters is evaluated without considering the word structure in written texts from different natural languages. Our dataset comprises 85 ebookseBooks written in 17 different European languages. The similarity between patterns of length m is determined by the Hamming distance and a value r is considered to define a matching between two patterns, i.e., a repetition is defined if the Hamming distance is equal or less than the given threshold value r. In this way, we calculate the adjacency matrix, where a connection between two nodes exists when a matching occurs. Next, the recurrence network is constructed for the texts and some representative network metrics are calculated. Our results show that average values of network density, clustering, and assortativity are larger than their corresponding shuffled versions, while for metrics like such as closeness, both original and random sequences exhibit similar values. Moreover, our calculations show similar average values for density among languages which that belong to the same linguistic family. In addition, the application of a linear discriminant analysis leads to well-separated clusters of family languages based on based on the network-density properties. Finally, we discuss our results in the context of the general characteristics of written texts.
format	Online Article Text
id	pubmed-7515007
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-75150072020-11-09 Recurrence Networks in Natural Languages Baeza-Blancas, Edgar Obregón-Quintana, Bibiana Hernández-Gómez, Candelario Gómez-Meléndez, Domingo Aguilar-Velázquez, Daniel Liebovitch, Larry S. Guzmán-Vargas, Lev Entropy (Basel) Article We present a study of natural language using the recurrence network method. In our approach, the repetition of patterns of characters is evaluated without considering the word structure in written texts from different natural languages. Our dataset comprises 85 ebookseBooks written in 17 different European languages. The similarity between patterns of length m is determined by the Hamming distance and a value r is considered to define a matching between two patterns, i.e., a repetition is defined if the Hamming distance is equal or less than the given threshold value r. In this way, we calculate the adjacency matrix, where a connection between two nodes exists when a matching occurs. Next, the recurrence network is constructed for the texts and some representative network metrics are calculated. Our results show that average values of network density, clustering, and assortativity are larger than their corresponding shuffled versions, while for metrics like such as closeness, both original and random sequences exhibit similar values. Moreover, our calculations show similar average values for density among languages which that belong to the same linguistic family. In addition, the application of a linear discriminant analysis leads to well-separated clusters of family languages based on based on the network-density properties. Finally, we discuss our results in the context of the general characteristics of written texts. MDPI 2019-05-23 /pmc/articles/PMC7515007/ /pubmed/33267231 http://dx.doi.org/10.3390/e21050517 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Baeza-Blancas, Edgar Obregón-Quintana, Bibiana Hernández-Gómez, Candelario Gómez-Meléndez, Domingo Aguilar-Velázquez, Daniel Liebovitch, Larry S. Guzmán-Vargas, Lev Recurrence Networks in Natural Languages
title	Recurrence Networks in Natural Languages
title_full	Recurrence Networks in Natural Languages
title_fullStr	Recurrence Networks in Natural Languages
title_full_unstemmed	Recurrence Networks in Natural Languages
title_short	Recurrence Networks in Natural Languages
title_sort	recurrence networks in natural languages
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7515007/ https://www.ncbi.nlm.nih.gov/pubmed/33267231 http://dx.doi.org/10.3390/e21050517
work_keys_str_mv	AT baezablancasedgar recurrencenetworksinnaturallanguages AT obregonquintanabibiana recurrencenetworksinnaturallanguages AT hernandezgomezcandelario recurrencenetworksinnaturallanguages AT gomezmelendezdomingo recurrencenetworksinnaturallanguages AT aguilarvelazquezdaniel recurrencenetworksinnaturallanguages AT liebovitchlarrys recurrencenetworksinnaturallanguages AT guzmanvargaslev recurrencenetworksinnaturallanguages

Recurrence Networks in Natural Languages

Ejemplares similares