Cargando…

ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions

Academic institutions need to maintain publication lists for thousands of faculty and other scholars. Automated tools are essential to minimize the need for direct feedback from the scholars themselves who are practically unable to commit necessary effort to keep the data accurate. In relying exclus...

Descripción completa

Detalles Bibliográficos
Autores principales:	Albert, Paul J., Dutta, Sarbajit, Lin, Jie, Zhu, Zimeng, Bales, Michael, Johnson, Stephen B., Mansour, Mohammad, Wright, Drew, Wheeler, Terrie R., Cole, Curtis L.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8016248/ https://www.ncbi.nlm.nih.gov/pubmed/33793563 http://dx.doi.org/10.1371/journal.pone.0244641

_version_	1783673819218575360
author	Albert, Paul J. Dutta, Sarbajit Lin, Jie Zhu, Zimeng Bales, Michael Johnson, Stephen B. Mansour, Mohammad Wright, Drew Wheeler, Terrie R. Cole, Curtis L.
author_facet	Albert, Paul J. Dutta, Sarbajit Lin, Jie Zhu, Zimeng Bales, Michael Johnson, Stephen B. Mansour, Mohammad Wright, Drew Wheeler, Terrie R. Cole, Curtis L.
author_sort	Albert, Paul J.
collection	PubMed
description	Academic institutions need to maintain publication lists for thousands of faculty and other scholars. Automated tools are essential to minimize the need for direct feedback from the scholars themselves who are practically unable to commit necessary effort to keep the data accurate. In relying exclusively on clustering techniques, author disambiguation applications fail to satisfy key use cases of academic institutions. Algorithms can perfectly group together a set of publications authored by a common individual, but, for them to be useful to an academic institution, they need to programmatically and recurrently map articles to thousands of scholars of interest en masse. Consistent with a savvy librarian’s approach for generating a scholar’s list of publications, identity-driven authorship prediction is the process of using information about a scholar to quantify the likelihood that person wrote certain articles. ReCiter is an application that attempts to do exactly that. ReCiter uses institutionally-maintained identity data such as name of department and year of terminal degree to predict which articles a given scholar has authored. To compute the overall score for a given candidate article from PubMed (and, optionally, Scopus), ReCiter uses: up to 12 types of commonly available, identity data; whether other members of a cluster have been accepted or rejected by a user; and the average score of a cluster. In addition, ReCiter provides scoring and qualitative evidence supporting why particular articles are suggested. This context and confidence scoring allows curators to more accurately provide feedback on behalf of scholars. To help users to more efficiently curate publication lists, we used a support vector machine analysis to optimize the scoring of the ReCiter algorithm. In our analysis of a diverse test group of 500 scholars at an academic private medical center, ReCiter correctly predicted 98% of their publications in PubMed.
format	Online Article Text
id	pubmed-8016248
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-80162482021-04-08 ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions Albert, Paul J. Dutta, Sarbajit Lin, Jie Zhu, Zimeng Bales, Michael Johnson, Stephen B. Mansour, Mohammad Wright, Drew Wheeler, Terrie R. Cole, Curtis L. PLoS One Research Article Academic institutions need to maintain publication lists for thousands of faculty and other scholars. Automated tools are essential to minimize the need for direct feedback from the scholars themselves who are practically unable to commit necessary effort to keep the data accurate. In relying exclusively on clustering techniques, author disambiguation applications fail to satisfy key use cases of academic institutions. Algorithms can perfectly group together a set of publications authored by a common individual, but, for them to be useful to an academic institution, they need to programmatically and recurrently map articles to thousands of scholars of interest en masse. Consistent with a savvy librarian’s approach for generating a scholar’s list of publications, identity-driven authorship prediction is the process of using information about a scholar to quantify the likelihood that person wrote certain articles. ReCiter is an application that attempts to do exactly that. ReCiter uses institutionally-maintained identity data such as name of department and year of terminal degree to predict which articles a given scholar has authored. To compute the overall score for a given candidate article from PubMed (and, optionally, Scopus), ReCiter uses: up to 12 types of commonly available, identity data; whether other members of a cluster have been accepted or rejected by a user; and the average score of a cluster. In addition, ReCiter provides scoring and qualitative evidence supporting why particular articles are suggested. This context and confidence scoring allows curators to more accurately provide feedback on behalf of scholars. To help users to more efficiently curate publication lists, we used a support vector machine analysis to optimize the scoring of the ReCiter algorithm. In our analysis of a diverse test group of 500 scholars at an academic private medical center, ReCiter correctly predicted 98% of their publications in PubMed. Public Library of Science 2021-04-01 /pmc/articles/PMC8016248/ /pubmed/33793563 http://dx.doi.org/10.1371/journal.pone.0244641 Text en © 2021 Albert et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Albert, Paul J. Dutta, Sarbajit Lin, Jie Zhu, Zimeng Bales, Michael Johnson, Stephen B. Mansour, Mohammad Wright, Drew Wheeler, Terrie R. Cole, Curtis L. ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions
title	ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions
title_full	ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions
title_fullStr	ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions
title_full_unstemmed	ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions
title_short	ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions
title_sort	reciter: an open source, identity-driven, authorship prediction algorithm optimized for academic institutions
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8016248/ https://www.ncbi.nlm.nih.gov/pubmed/33793563 http://dx.doi.org/10.1371/journal.pone.0244641
work_keys_str_mv	AT albertpaulj reciteranopensourceidentitydrivenauthorshippredictionalgorithmoptimizedforacademicinstitutions AT duttasarbajit reciteranopensourceidentitydrivenauthorshippredictionalgorithmoptimizedforacademicinstitutions AT linjie reciteranopensourceidentitydrivenauthorshippredictionalgorithmoptimizedforacademicinstitutions AT zhuzimeng reciteranopensourceidentitydrivenauthorshippredictionalgorithmoptimizedforacademicinstitutions AT balesmichael reciteranopensourceidentitydrivenauthorshippredictionalgorithmoptimizedforacademicinstitutions AT johnsonstephenb reciteranopensourceidentitydrivenauthorshippredictionalgorithmoptimizedforacademicinstitutions AT mansourmohammad reciteranopensourceidentitydrivenauthorshippredictionalgorithmoptimizedforacademicinstitutions AT wrightdrew reciteranopensourceidentitydrivenauthorshippredictionalgorithmoptimizedforacademicinstitutions AT wheelerterrier reciteranopensourceidentitydrivenauthorshippredictionalgorithmoptimizedforacademicinstitutions AT colecurtisl reciteranopensourceidentitydrivenauthorshippredictionalgorithmoptimizedforacademicinstitutions

ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions

Ejemplares similares