Cargando…
Automated Authorship Attribution Using Advanced Signal Classification Techniques
In this paper, we develop two automated authorship attribution schemes, one based on Multiple Discriminant Analysis (MDA) and the other based on a Support Vector Machine (SVM). The classification features we exploit are based on word frequencies in the text. We adopt an approach of preprocessing eac...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3577839/ https://www.ncbi.nlm.nih.gov/pubmed/23437047 http://dx.doi.org/10.1371/journal.pone.0054998 |
_version_ | 1782259984834232320 |
---|---|
author | Ebrahimpour, Maryam Putniņš, Tālis J. Berryman, Matthew J. Allison, Andrew Ng, Brian W.-H. Abbott, Derek |
author_facet | Ebrahimpour, Maryam Putniņš, Tālis J. Berryman, Matthew J. Allison, Andrew Ng, Brian W.-H. Abbott, Derek |
author_sort | Ebrahimpour, Maryam |
collection | PubMed |
description | In this paper, we develop two automated authorship attribution schemes, one based on Multiple Discriminant Analysis (MDA) and the other based on a Support Vector Machine (SVM). The classification features we exploit are based on word frequencies in the text. We adopt an approach of preprocessing each text by stripping it of all characters except a-z and space. This is in order to increase the portability of the software to different types of texts. We test the methodology on a corpus of undisputed English texts, and use leave-one-out cross validation to demonstrate classification accuracies in excess of 90%. We further test our methods on the Federalist Papers, which have a partly disputed authorship and a fair degree of scholarly consensus. And finally, we apply our methodology to the question of the authorship of the Letter to the Hebrews by comparing it against a number of original Greek texts of known authorship. These tests identify where some of the limitations lie, motivating a number of open questions for future work. An open source implementation of our methodology is freely available for use at https://github.com/matthewberryman/author-detection. |
format | Online Article Text |
id | pubmed-3577839 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-35778392013-02-22 Automated Authorship Attribution Using Advanced Signal Classification Techniques Ebrahimpour, Maryam Putniņš, Tālis J. Berryman, Matthew J. Allison, Andrew Ng, Brian W.-H. Abbott, Derek PLoS One Research Article In this paper, we develop two automated authorship attribution schemes, one based on Multiple Discriminant Analysis (MDA) and the other based on a Support Vector Machine (SVM). The classification features we exploit are based on word frequencies in the text. We adopt an approach of preprocessing each text by stripping it of all characters except a-z and space. This is in order to increase the portability of the software to different types of texts. We test the methodology on a corpus of undisputed English texts, and use leave-one-out cross validation to demonstrate classification accuracies in excess of 90%. We further test our methods on the Federalist Papers, which have a partly disputed authorship and a fair degree of scholarly consensus. And finally, we apply our methodology to the question of the authorship of the Letter to the Hebrews by comparing it against a number of original Greek texts of known authorship. These tests identify where some of the limitations lie, motivating a number of open questions for future work. An open source implementation of our methodology is freely available for use at https://github.com/matthewberryman/author-detection. Public Library of Science 2013-02-20 /pmc/articles/PMC3577839/ /pubmed/23437047 http://dx.doi.org/10.1371/journal.pone.0054998 Text en © 2013 Ebrahimpour et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Ebrahimpour, Maryam Putniņš, Tālis J. Berryman, Matthew J. Allison, Andrew Ng, Brian W.-H. Abbott, Derek Automated Authorship Attribution Using Advanced Signal Classification Techniques |
title | Automated Authorship Attribution Using Advanced Signal Classification Techniques |
title_full | Automated Authorship Attribution Using Advanced Signal Classification Techniques |
title_fullStr | Automated Authorship Attribution Using Advanced Signal Classification Techniques |
title_full_unstemmed | Automated Authorship Attribution Using Advanced Signal Classification Techniques |
title_short | Automated Authorship Attribution Using Advanced Signal Classification Techniques |
title_sort | automated authorship attribution using advanced signal classification techniques |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3577839/ https://www.ncbi.nlm.nih.gov/pubmed/23437047 http://dx.doi.org/10.1371/journal.pone.0054998 |
work_keys_str_mv | AT ebrahimpourmaryam automatedauthorshipattributionusingadvancedsignalclassificationtechniques AT putninstalisj automatedauthorshipattributionusingadvancedsignalclassificationtechniques AT berrymanmatthewj automatedauthorshipattributionusingadvancedsignalclassificationtechniques AT allisonandrew automatedauthorshipattributionusingadvancedsignalclassificationtechniques AT ngbrianwh automatedauthorshipattributionusingadvancedsignalclassificationtechniques AT abbottderek automatedauthorshipattributionusingadvancedsignalclassificationtechniques |