Cargando…

A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays

In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Infor...

Descripción completa

Detalles Bibliográficos
Autores principales: Naeni, Leila M., Craig, Hugh, Berretta, Regina, Moscato, Pablo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5003342/
https://www.ncbi.nlm.nih.gov/pubmed/27571416
http://dx.doi.org/10.1371/journal.pone.0157988
_version_ 1782450631986905088
author Naeni, Leila M.
Craig, Hugh
Berretta, Regina
Moscato, Pablo
author_facet Naeni, Leila M.
Craig, Hugh
Berretta, Regina
Moscato, Pablo
author_sort Naeni, Leila M.
collection PubMed
description In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16(th) and 17(th) centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays.
format Online
Article
Text
id pubmed-5003342
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-50033422016-09-12 A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays Naeni, Leila M. Craig, Hugh Berretta, Regina Moscato, Pablo PLoS One Research Article In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16(th) and 17(th) centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays. Public Library of Science 2016-08-29 /pmc/articles/PMC5003342/ /pubmed/27571416 http://dx.doi.org/10.1371/journal.pone.0157988 Text en © 2016 Naeni et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Naeni, Leila M.
Craig, Hugh
Berretta, Regina
Moscato, Pablo
A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays
title A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays
title_full A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays
title_fullStr A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays
title_full_unstemmed A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays
title_short A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays
title_sort novel clustering methodology based on modularity optimisation for detecting authorship affinities in shakespearean era plays
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5003342/
https://www.ncbi.nlm.nih.gov/pubmed/27571416
http://dx.doi.org/10.1371/journal.pone.0157988
work_keys_str_mv AT naenileilam anovelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays
AT craighugh anovelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays
AT berrettaregina anovelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays
AT moscatopablo anovelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays
AT naenileilam novelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays
AT craighugh novelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays
AT berrettaregina novelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays
AT moscatopablo novelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays