Cargando…
A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays
In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Infor...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5003342/ https://www.ncbi.nlm.nih.gov/pubmed/27571416 http://dx.doi.org/10.1371/journal.pone.0157988 |
_version_ | 1782450631986905088 |
---|---|
author | Naeni, Leila M. Craig, Hugh Berretta, Regina Moscato, Pablo |
author_facet | Naeni, Leila M. Craig, Hugh Berretta, Regina Moscato, Pablo |
author_sort | Naeni, Leila M. |
collection | PubMed |
description | In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16(th) and 17(th) centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays. |
format | Online Article Text |
id | pubmed-5003342 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-50033422016-09-12 A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays Naeni, Leila M. Craig, Hugh Berretta, Regina Moscato, Pablo PLoS One Research Article In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16(th) and 17(th) centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays. Public Library of Science 2016-08-29 /pmc/articles/PMC5003342/ /pubmed/27571416 http://dx.doi.org/10.1371/journal.pone.0157988 Text en © 2016 Naeni et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Naeni, Leila M. Craig, Hugh Berretta, Regina Moscato, Pablo A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays |
title | A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays |
title_full | A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays |
title_fullStr | A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays |
title_full_unstemmed | A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays |
title_short | A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays |
title_sort | novel clustering methodology based on modularity optimisation for detecting authorship affinities in shakespearean era plays |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5003342/ https://www.ncbi.nlm.nih.gov/pubmed/27571416 http://dx.doi.org/10.1371/journal.pone.0157988 |
work_keys_str_mv | AT naenileilam anovelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays AT craighugh anovelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays AT berrettaregina anovelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays AT moscatopablo anovelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays AT naenileilam novelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays AT craighugh novelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays AT berrettaregina novelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays AT moscatopablo novelclusteringmethodologybasedonmodularityoptimisationfordetectingauthorshipaffinitiesinshakespeareaneraplays |