Cargando…

A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays

In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Infor...

Descripción completa

Detalles Bibliográficos
Autores principales:	Naeni, Leila M., Craig, Hugh, Berretta, Regina, Moscato, Pablo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2016
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5003342/ https://www.ncbi.nlm.nih.gov/pubmed/27571416 http://dx.doi.org/10.1371/journal.pone.0157988

_version_	1782450631986905088
author	Naeni, Leila M. Craig, Hugh Berretta, Regina Moscato, Pablo
author_facet	Naeni, Leila M. Craig, Hugh Berretta, Regina Moscato, Pablo
author_sort	Naeni, Leila M.
collection	PubMed
description	In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16(th) and 17(th) centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays.
format	Online Article Text
id	pubmed-5003342
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-50033422016-09-12 A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays Naeni, Leila M. Craig, Hugh Berretta, Regina Moscato, Pablo PLoS One Research Article In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16(th) and 17(th) centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays. Public Library of Science 2016-08-29 /pmc/articles/PMC5003342/ /pubmed/27571416 http://dx.doi.org/10.1371/journal.pone.0157988 Text en © 2016 Naeni et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Naeni, Leila M. Craig, Hugh Berretta, Regina Moscato, Pablo A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays
title	A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays
title_full	A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays
title_fullStr	A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays
title_full_unstemmed	A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays
title_short	A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays
title_sort	novel clustering methodology based on modularity optimisation for detecting authorship affinities in shakespearean era plays
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5003342/ https://www.ncbi.nlm.nih.gov/pubmed/27571416 http://dx.doi.org/10.1371/journal.pone.0157988
work_keys_str_mv	AT naenileilam anovelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays AT craighugh anovelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays AT berrettaregina anovelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays AT moscatopablo anovelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays AT naenileilam novelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays AT craighugh novelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays AT berrettaregina novelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays AT moscatopablo novelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays

A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays

Ejemplares similares