Cargando…

Probing the Topological Properties of Complex Networks Modeling Short Written Texts

In recent years, graph theory has been widely employed to probe several language properties. More specifically, the so-called word adjacency model has been proven useful for tackling several practical problems, especially those relying on textual stylistic analysis. The most common approach to treat...

Descripción completa

Detalles Bibliográficos
Autor principal: Amancio, Diego R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4342245/
https://www.ncbi.nlm.nih.gov/pubmed/25719799
http://dx.doi.org/10.1371/journal.pone.0118394
_version_ 1782359261230137344
author Amancio, Diego R.
author_facet Amancio, Diego R.
author_sort Amancio, Diego R.
collection PubMed
description In recent years, graph theory has been widely employed to probe several language properties. More specifically, the so-called word adjacency model has been proven useful for tackling several practical problems, especially those relying on textual stylistic analysis. The most common approach to treat texts as networks has simply considered either large pieces of texts or entire books. This approach has certainly worked well—many informative discoveries have been made this way—but it raises an uncomfortable question: could there be important topological patterns in small pieces of texts? To address this problem, the topological properties of subtexts sampled from entire books was probed. Statistical analyses performed on a dataset comprising 50 novels revealed that most of the traditional topological measurements are stable for short subtexts. When the performance of the authorship recognition task was analyzed, it was found that a proper sampling yields a discriminability similar to the one found with full texts. Surprisingly, the support vector machine classification based on the characterization of short texts outperformed the one performed with entire books. These findings suggest that a local topological analysis of large documents might improve its global characterization. Most importantly, it was verified, as a proof of principle, that short texts can be analyzed with the methods and concepts of complex networks. As a consequence, the techniques described here can be extended in a straightforward fashion to analyze texts as time-varying complex networks.
format Online
Article
Text
id pubmed-4342245
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-43422452015-03-04 Probing the Topological Properties of Complex Networks Modeling Short Written Texts Amancio, Diego R. PLoS One Research Article In recent years, graph theory has been widely employed to probe several language properties. More specifically, the so-called word adjacency model has been proven useful for tackling several practical problems, especially those relying on textual stylistic analysis. The most common approach to treat texts as networks has simply considered either large pieces of texts or entire books. This approach has certainly worked well—many informative discoveries have been made this way—but it raises an uncomfortable question: could there be important topological patterns in small pieces of texts? To address this problem, the topological properties of subtexts sampled from entire books was probed. Statistical analyses performed on a dataset comprising 50 novels revealed that most of the traditional topological measurements are stable for short subtexts. When the performance of the authorship recognition task was analyzed, it was found that a proper sampling yields a discriminability similar to the one found with full texts. Surprisingly, the support vector machine classification based on the characterization of short texts outperformed the one performed with entire books. These findings suggest that a local topological analysis of large documents might improve its global characterization. Most importantly, it was verified, as a proof of principle, that short texts can be analyzed with the methods and concepts of complex networks. As a consequence, the techniques described here can be extended in a straightforward fashion to analyze texts as time-varying complex networks. Public Library of Science 2015-02-26 /pmc/articles/PMC4342245/ /pubmed/25719799 http://dx.doi.org/10.1371/journal.pone.0118394 Text en © 2015 Diego R. Amancio http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Amancio, Diego R.
Probing the Topological Properties of Complex Networks Modeling Short Written Texts
title Probing the Topological Properties of Complex Networks Modeling Short Written Texts
title_full Probing the Topological Properties of Complex Networks Modeling Short Written Texts
title_fullStr Probing the Topological Properties of Complex Networks Modeling Short Written Texts
title_full_unstemmed Probing the Topological Properties of Complex Networks Modeling Short Written Texts
title_short Probing the Topological Properties of Complex Networks Modeling Short Written Texts
title_sort probing the topological properties of complex networks modeling short written texts
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4342245/
https://www.ncbi.nlm.nih.gov/pubmed/25719799
http://dx.doi.org/10.1371/journal.pone.0118394
work_keys_str_mv AT amanciodiegor probingthetopologicalpropertiesofcomplexnetworksmodelingshortwrittentexts