Cargando…

Author Name Disambiguation for PubMed

Log analysis shows that PubMed users frequently use author names in queries for retrieving scientific literature. However, author name ambiguity may lead to irrelevant retrieval results. To improve the PubMed user experience with author name queries, we designed an author name disambiguation system...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Wanli, Islamaj Doğan, Rezarta, Kim, Sun, Comeau, Donald C., Kim, Won, Yeganova, Lana, Lu, Zhiyong, Wilbur, W. John
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2013
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5530597/ https://www.ncbi.nlm.nih.gov/pubmed/28758138 http://dx.doi.org/10.1002/asi.23063

_version_	1783253294283489280
author	Liu, Wanli Islamaj Doğan, Rezarta Kim, Sun Comeau, Donald C. Kim, Won Yeganova, Lana Lu, Zhiyong Wilbur, W. John
author_facet	Liu, Wanli Islamaj Doğan, Rezarta Kim, Sun Comeau, Donald C. Kim, Won Yeganova, Lana Lu, Zhiyong Wilbur, W. John
author_sort	Liu, Wanli
collection	PubMed
description	Log analysis shows that PubMed users frequently use author names in queries for retrieving scientific literature. However, author name ambiguity may lead to irrelevant retrieval results. To improve the PubMed user experience with author name queries, we designed an author name disambiguation system consisting of similarity estimation and agglomerative clustering. A machine-learning method was employed to score the features for disambiguating a pair of papers with ambiguous names. These features enable the computation of pairwise similarity scores to estimate the probability of a pair of papers belonging to the same author, which drives an agglomerative clustering algorithm regulated by 2 factors: name compatibility and probability level. With transitivity violation correction, high precision author clustering is achieved by focusing on minimizing false-positive pairing. Disambiguation performance is evaluated with manual verification of random samples of pairs from clustering results. When compared with a state-of-the-art system, our evaluation shows that among all the pairs the lumping error rate drops from 10.1% to 2.2% for our system, while the splitting error rises from 1.8% to 7.7%. This results in an overall error rate of 9.9%, compared with 11.9% for the state-of-the-art method. Other evaluations based on gold standard data also show the increase in accuracy of our clustering. We attribute the performance improvement to the machine-learning method driven by a large-scale training set and the clustering algorithm regulated by a name compatibility scheme preferring precision. With integration of the author name disambiguation system into the PubMed search engine, the overall click-through-rate of PubMed users on author name query results improved from 34.9% to 36.9%.
format	Online Article Text
id	pubmed-5530597
institution	National Center for Biotechnology Information
language	English
publishDate	2013
record_format	MEDLINE/PubMed
spelling	pubmed-55305972017-07-27 Author Name Disambiguation for PubMed Liu, Wanli Islamaj Doğan, Rezarta Kim, Sun Comeau, Donald C. Kim, Won Yeganova, Lana Lu, Zhiyong Wilbur, W. John J Assoc Inf Sci Technol Article Log analysis shows that PubMed users frequently use author names in queries for retrieving scientific literature. However, author name ambiguity may lead to irrelevant retrieval results. To improve the PubMed user experience with author name queries, we designed an author name disambiguation system consisting of similarity estimation and agglomerative clustering. A machine-learning method was employed to score the features for disambiguating a pair of papers with ambiguous names. These features enable the computation of pairwise similarity scores to estimate the probability of a pair of papers belonging to the same author, which drives an agglomerative clustering algorithm regulated by 2 factors: name compatibility and probability level. With transitivity violation correction, high precision author clustering is achieved by focusing on minimizing false-positive pairing. Disambiguation performance is evaluated with manual verification of random samples of pairs from clustering results. When compared with a state-of-the-art system, our evaluation shows that among all the pairs the lumping error rate drops from 10.1% to 2.2% for our system, while the splitting error rises from 1.8% to 7.7%. This results in an overall error rate of 9.9%, compared with 11.9% for the state-of-the-art method. Other evaluations based on gold standard data also show the increase in accuracy of our clustering. We attribute the performance improvement to the machine-learning method driven by a large-scale training set and the clustering algorithm regulated by a name compatibility scheme preferring precision. With integration of the author name disambiguation system into the PubMed search engine, the overall click-through-rate of PubMed users on author name query results improved from 34.9% to 36.9%. 2013-11-21 2014-04 /pmc/articles/PMC5530597/ /pubmed/28758138 http://dx.doi.org/10.1002/asi.23063 Text en Re-use of this article is permitted in accordance with the Terms and Conditions set out at http://olabout.wiley.com/WileyCDA/Section/id-817008.html. http://creativecommons.org/licenses/by/4.0/ This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Article Liu, Wanli Islamaj Doğan, Rezarta Kim, Sun Comeau, Donald C. Kim, Won Yeganova, Lana Lu, Zhiyong Wilbur, W. John Author Name Disambiguation for PubMed
title	Author Name Disambiguation for PubMed
title_full	Author Name Disambiguation for PubMed
title_fullStr	Author Name Disambiguation for PubMed
title_full_unstemmed	Author Name Disambiguation for PubMed
title_short	Author Name Disambiguation for PubMed
title_sort	author name disambiguation for pubmed
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5530597/ https://www.ncbi.nlm.nih.gov/pubmed/28758138 http://dx.doi.org/10.1002/asi.23063
work_keys_str_mv	AT liuwanli authornamedisambiguationforpubmed AT islamajdoganrezarta authornamedisambiguationforpubmed AT kimsun authornamedisambiguationforpubmed AT comeaudonaldc authornamedisambiguationforpubmed AT kimwon authornamedisambiguationforpubmed AT yeganovalana authornamedisambiguationforpubmed AT luzhiyong authornamedisambiguationforpubmed AT wilburwjohn authornamedisambiguationforpubmed

Author Name Disambiguation for PubMed

Ejemplares similares