Cargando…

Wikipedia Information Flow Analysis Reveals the Scale-Free Architecture of the Semantic Space

In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the...

Descripción completa

Detalles Bibliográficos
Autores principales: Masucci, Adolfo Paolo, Kalampokis, Alkiviadis, Eguíluz, Victor Martínez, Hernández-García, Emilio
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3046238/
https://www.ncbi.nlm.nih.gov/pubmed/21407801
http://dx.doi.org/10.1371/journal.pone.0017333
_version_ 1782198940849930240
author Masucci, Adolfo Paolo
Kalampokis, Alkiviadis
Eguíluz, Victor Martínez
Hernández-García, Emilio
author_facet Masucci, Adolfo Paolo
Kalampokis, Alkiviadis
Eguíluz, Victor Martínez
Hernández-García, Emilio
author_sort Masucci, Adolfo Paolo
collection PubMed
description In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the semantic space is characterised by scale-free behaviour at different levels of complexity and this relates the semantic space to a wide range of biological, social and linguistics phenomena. In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free. Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties. However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process. After giving a detailed description and interpretation of the topological properties of the semantic space, we introduce a stochastic model of content-based network, based on a copy and mutation algorithm and on the Heaps' law, that is able to capture the main statistical properties of the analysed semantic space, including the Zipf's law for the word frequency distribution.
format Text
id pubmed-3046238
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-30462382011-03-15 Wikipedia Information Flow Analysis Reveals the Scale-Free Architecture of the Semantic Space Masucci, Adolfo Paolo Kalampokis, Alkiviadis Eguíluz, Victor Martínez Hernández-García, Emilio PLoS One Research Article In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the semantic space is characterised by scale-free behaviour at different levels of complexity and this relates the semantic space to a wide range of biological, social and linguistics phenomena. In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free. Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties. However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process. After giving a detailed description and interpretation of the topological properties of the semantic space, we introduce a stochastic model of content-based network, based on a copy and mutation algorithm and on the Heaps' law, that is able to capture the main statistical properties of the analysed semantic space, including the Zipf's law for the word frequency distribution. Public Library of Science 2011-02-28 /pmc/articles/PMC3046238/ /pubmed/21407801 http://dx.doi.org/10.1371/journal.pone.0017333 Text en Masucci et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Masucci, Adolfo Paolo
Kalampokis, Alkiviadis
Eguíluz, Victor Martínez
Hernández-García, Emilio
Wikipedia Information Flow Analysis Reveals the Scale-Free Architecture of the Semantic Space
title Wikipedia Information Flow Analysis Reveals the Scale-Free Architecture of the Semantic Space
title_full Wikipedia Information Flow Analysis Reveals the Scale-Free Architecture of the Semantic Space
title_fullStr Wikipedia Information Flow Analysis Reveals the Scale-Free Architecture of the Semantic Space
title_full_unstemmed Wikipedia Information Flow Analysis Reveals the Scale-Free Architecture of the Semantic Space
title_short Wikipedia Information Flow Analysis Reveals the Scale-Free Architecture of the Semantic Space
title_sort wikipedia information flow analysis reveals the scale-free architecture of the semantic space
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3046238/
https://www.ncbi.nlm.nih.gov/pubmed/21407801
http://dx.doi.org/10.1371/journal.pone.0017333
work_keys_str_mv AT masucciadolfopaolo wikipediainformationflowanalysisrevealsthescalefreearchitectureofthesemanticspace
AT kalampokisalkiviadis wikipediainformationflowanalysisrevealsthescalefreearchitectureofthesemanticspace
AT eguiluzvictormartinez wikipediainformationflowanalysisrevealsthescalefreearchitectureofthesemanticspace
AT hernandezgarciaemilio wikipediainformationflowanalysisrevealsthescalefreearchitectureofthesemanticspace