Cargando…

Language Individuation and Marker Words: Shakespeare and His Maxwell's Demon

BACKGROUND: Within the structural and grammatical bounds of a common language, all authors develop their own distinctive writing styles. Whether the relative occurrence of common words can be measured to produce accurate models of authorship is of particular interest. This work introduces a new scor...

Descripción completa

Detalles Bibliográficos
Autores principales: Marsden, John, Budden, David, Craig, Hugh, Moscato, Pablo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694980/
https://www.ncbi.nlm.nih.gov/pubmed/23826143
http://dx.doi.org/10.1371/journal.pone.0066813
_version_ 1782274927196372992
author Marsden, John
Budden, David
Craig, Hugh
Moscato, Pablo
author_facet Marsden, John
Budden, David
Craig, Hugh
Moscato, Pablo
author_sort Marsden, John
collection PubMed
description BACKGROUND: Within the structural and grammatical bounds of a common language, all authors develop their own distinctive writing styles. Whether the relative occurrence of common words can be measured to produce accurate models of authorship is of particular interest. This work introduces a new score that helps to highlight such variations in word occurrence, and is applied to produce models of authorship of a large group of plays from the Shakespearean era. METHODOLOGY: A text corpus containing 55,055 unique words was generated from 168 plays from the Shakespearean era (16th and 17th centuries) of undisputed authorship. A new score, CM1, is introduced to measure variation patterns based on the frequency of occurrence of each word for the authors John Fletcher, Ben Jonson, Thomas Middleton and William Shakespeare, compared to the rest of the authors in the study (which provides a reference of relative word usage at that time). A total of 50 WEKA methods were applied for Fletcher, Jonson and Middleton, to identify those which were able to produce models yielding over 90% classification accuracy. This ensemble of WEKA methods was then applied to model Shakespearean authorship across all 168 plays, yielding a Matthews' correlation coefficient (MCC) performance of over 90%. Furthermore, the best model yielded an MCC of 99%. CONCLUSIONS: Our results suggest that different authors, while adhering to the structural and grammatical bounds of a common language, develop measurably distinct styles by the tendency to over-utilise or avoid particular common words and phrasings. Considering language and the potential of words as an abstract chaotic system with a high entropy, similarities can be drawn to the Maxwell's Demon thought experiment; authors subconsciously favour or filter certain words, modifying the probability profile in ways that could reflect their individuality and style.
format Online
Article
Text
id pubmed-3694980
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-36949802013-07-03 Language Individuation and Marker Words: Shakespeare and His Maxwell's Demon Marsden, John Budden, David Craig, Hugh Moscato, Pablo PLoS One Research Article BACKGROUND: Within the structural and grammatical bounds of a common language, all authors develop their own distinctive writing styles. Whether the relative occurrence of common words can be measured to produce accurate models of authorship is of particular interest. This work introduces a new score that helps to highlight such variations in word occurrence, and is applied to produce models of authorship of a large group of plays from the Shakespearean era. METHODOLOGY: A text corpus containing 55,055 unique words was generated from 168 plays from the Shakespearean era (16th and 17th centuries) of undisputed authorship. A new score, CM1, is introduced to measure variation patterns based on the frequency of occurrence of each word for the authors John Fletcher, Ben Jonson, Thomas Middleton and William Shakespeare, compared to the rest of the authors in the study (which provides a reference of relative word usage at that time). A total of 50 WEKA methods were applied for Fletcher, Jonson and Middleton, to identify those which were able to produce models yielding over 90% classification accuracy. This ensemble of WEKA methods was then applied to model Shakespearean authorship across all 168 plays, yielding a Matthews' correlation coefficient (MCC) performance of over 90%. Furthermore, the best model yielded an MCC of 99%. CONCLUSIONS: Our results suggest that different authors, while adhering to the structural and grammatical bounds of a common language, develop measurably distinct styles by the tendency to over-utilise or avoid particular common words and phrasings. Considering language and the potential of words as an abstract chaotic system with a high entropy, similarities can be drawn to the Maxwell's Demon thought experiment; authors subconsciously favour or filter certain words, modifying the probability profile in ways that could reflect their individuality and style. Public Library of Science 2013-06-27 /pmc/articles/PMC3694980/ /pubmed/23826143 http://dx.doi.org/10.1371/journal.pone.0066813 Text en © 2013 Marsden et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Marsden, John
Budden, David
Craig, Hugh
Moscato, Pablo
Language Individuation and Marker Words: Shakespeare and His Maxwell's Demon
title Language Individuation and Marker Words: Shakespeare and His Maxwell's Demon
title_full Language Individuation and Marker Words: Shakespeare and His Maxwell's Demon
title_fullStr Language Individuation and Marker Words: Shakespeare and His Maxwell's Demon
title_full_unstemmed Language Individuation and Marker Words: Shakespeare and His Maxwell's Demon
title_short Language Individuation and Marker Words: Shakespeare and His Maxwell's Demon
title_sort language individuation and marker words: shakespeare and his maxwell's demon
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694980/
https://www.ncbi.nlm.nih.gov/pubmed/23826143
http://dx.doi.org/10.1371/journal.pone.0066813
work_keys_str_mv AT marsdenjohn languageindividuationandmarkerwordsshakespeareandhismaxwellsdemon
AT buddendavid languageindividuationandmarkerwordsshakespeareandhismaxwellsdemon
AT craighugh languageindividuationandmarkerwordsshakespeareandhismaxwellsdemon
AT moscatopablo languageindividuationandmarkerwordsshakespeareandhismaxwellsdemon