Cargando…

Cracking double-blind review: Authorship attribution with deep learning

Double-blind peer review is considered a pillar of academic research because it is perceived to ensure a fair, unbiased, and fact-centered scientific discussion. Yet, experienced researchers can often correctly guess from which research group an anonymous submission originates, biasing the peer-revi...

Descripción completa

Detalles Bibliográficos
Autores principales: Bauersfeld, Leonard, Romero, Angel, Muglikar, Manasi, Scaramuzza, Davide
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10313031/
https://www.ncbi.nlm.nih.gov/pubmed/37390072
http://dx.doi.org/10.1371/journal.pone.0287611
_version_ 1785067035660648448
author Bauersfeld, Leonard
Romero, Angel
Muglikar, Manasi
Scaramuzza, Davide
author_facet Bauersfeld, Leonard
Romero, Angel
Muglikar, Manasi
Scaramuzza, Davide
author_sort Bauersfeld, Leonard
collection PubMed
description Double-blind peer review is considered a pillar of academic research because it is perceived to ensure a fair, unbiased, and fact-centered scientific discussion. Yet, experienced researchers can often correctly guess from which research group an anonymous submission originates, biasing the peer-review process. In this work, we present a transformer-based, neural-network architecture that only uses the text content and the author names in the bibliography to attribute an anonymous manuscript to an author. To train and evaluate our method, we created the largest authorship-identification dataset to date. It leverages all research papers publicly available on arXiv amounting to over 2 million manuscripts. In arXiv-subsets with up to 2,000 different authors, our method achieves an unprecedented authorship attribution accuracy, where up to 73% of papers are attributed correctly. We present a scaling analysis to highlight the applicability of the proposed method to even larger datasets when sufficient compute capabilities are more widely available to the academic community. Furthermore, we analyze the attribution accuracy in settings where the goal is to identify all authors of an anonymous manuscript. Thanks to our method, we are not only able to predict the author of an anonymous work but we also provide empirical evidence of the key aspects that make a paper attributable. We have open-sourced the necessary tools to reproduce our experiments.
format Online
Article
Text
id pubmed-10313031
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-103130312023-07-01 Cracking double-blind review: Authorship attribution with deep learning Bauersfeld, Leonard Romero, Angel Muglikar, Manasi Scaramuzza, Davide PLoS One Research Article Double-blind peer review is considered a pillar of academic research because it is perceived to ensure a fair, unbiased, and fact-centered scientific discussion. Yet, experienced researchers can often correctly guess from which research group an anonymous submission originates, biasing the peer-review process. In this work, we present a transformer-based, neural-network architecture that only uses the text content and the author names in the bibliography to attribute an anonymous manuscript to an author. To train and evaluate our method, we created the largest authorship-identification dataset to date. It leverages all research papers publicly available on arXiv amounting to over 2 million manuscripts. In arXiv-subsets with up to 2,000 different authors, our method achieves an unprecedented authorship attribution accuracy, where up to 73% of papers are attributed correctly. We present a scaling analysis to highlight the applicability of the proposed method to even larger datasets when sufficient compute capabilities are more widely available to the academic community. Furthermore, we analyze the attribution accuracy in settings where the goal is to identify all authors of an anonymous manuscript. Thanks to our method, we are not only able to predict the author of an anonymous work but we also provide empirical evidence of the key aspects that make a paper attributable. We have open-sourced the necessary tools to reproduce our experiments. Public Library of Science 2023-06-30 /pmc/articles/PMC10313031/ /pubmed/37390072 http://dx.doi.org/10.1371/journal.pone.0287611 Text en © 2023 Bauersfeld et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Bauersfeld, Leonard
Romero, Angel
Muglikar, Manasi
Scaramuzza, Davide
Cracking double-blind review: Authorship attribution with deep learning
title Cracking double-blind review: Authorship attribution with deep learning
title_full Cracking double-blind review: Authorship attribution with deep learning
title_fullStr Cracking double-blind review: Authorship attribution with deep learning
title_full_unstemmed Cracking double-blind review: Authorship attribution with deep learning
title_short Cracking double-blind review: Authorship attribution with deep learning
title_sort cracking double-blind review: authorship attribution with deep learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10313031/
https://www.ncbi.nlm.nih.gov/pubmed/37390072
http://dx.doi.org/10.1371/journal.pone.0287611
work_keys_str_mv AT bauersfeldleonard crackingdoubleblindreviewauthorshipattributionwithdeeplearning
AT romeroangel crackingdoubleblindreviewauthorshipattributionwithdeeplearning
AT muglikarmanasi crackingdoubleblindreviewauthorshipattributionwithdeeplearning
AT scaramuzzadavide crackingdoubleblindreviewauthorshipattributionwithdeeplearning