Cargando…

On expert curation and scalability: UniProtKB/Swiss-Prot as a case study

MOTIVATION: Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in...

Descripción completa

Detalles Bibliográficos
Autores principales: Poux, Sylvain, Arighi, Cecilia N, Magrane, Michele, Bateman, Alex, Wei, Chih-Hsuan, Lu, Zhiyong, Boutet, Emmanuel, Bye-A-Jee, Hema, Famiglietti, Maria Livia, Roechert, Bernd, UniProt Consortium, The
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860168/
https://www.ncbi.nlm.nih.gov/pubmed/29036270
http://dx.doi.org/10.1093/bioinformatics/btx439
_version_ 1783307956506329088
author Poux, Sylvain
Arighi, Cecilia N
Magrane, Michele
Bateman, Alex
Wei, Chih-Hsuan
Lu, Zhiyong
Boutet, Emmanuel
Bye-A-Jee, Hema
Famiglietti, Maria Livia
Roechert, Bernd
UniProt Consortium, The
author_facet Poux, Sylvain
Arighi, Cecilia N
Magrane, Michele
Bateman, Alex
Wei, Chih-Hsuan
Lu, Zhiyong
Boutet, Emmanuel
Bye-A-Jee, Hema
Famiglietti, Maria Livia
Roechert, Bernd
UniProt Consortium, The
author_sort Poux, Sylvain
collection PubMed
description MOTIVATION: Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, their ability to keep up with the growth of biomedical literature is under scrutiny. Using UniProtKB/Swiss-Prot as a case study, we address this concern via multiple literature triage approaches. RESULTS: With the assistance of the PubTator text-mining tool, we tagged more than 10 000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture as demonstrated by the fact that 8000–10 000 papers are curated in UniProt each year while curators evaluate 50 000–70 000 papers per year. We show that 90% of the papers in PubMed are out of the scope of UniProt, that a maximum of 2–3% of the papers indexed in PubMed each year are relevant for UniProt curation, and that, despite appearances, expert curation in UniProt is scalable. AVAILABILITY AND IMPLEMENTATION: UniProt is freely available at http://www.uniprot.org/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5860168
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58601682018-03-21 On expert curation and scalability: UniProtKB/Swiss-Prot as a case study Poux, Sylvain Arighi, Cecilia N Magrane, Michele Bateman, Alex Wei, Chih-Hsuan Lu, Zhiyong Boutet, Emmanuel Bye-A-Jee, Hema Famiglietti, Maria Livia Roechert, Bernd UniProt Consortium, The Bioinformatics Original Papers MOTIVATION: Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, their ability to keep up with the growth of biomedical literature is under scrutiny. Using UniProtKB/Swiss-Prot as a case study, we address this concern via multiple literature triage approaches. RESULTS: With the assistance of the PubTator text-mining tool, we tagged more than 10 000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture as demonstrated by the fact that 8000–10 000 papers are curated in UniProt each year while curators evaluate 50 000–70 000 papers per year. We show that 90% of the papers in PubMed are out of the scope of UniProt, that a maximum of 2–3% of the papers indexed in PubMed each year are relevant for UniProt curation, and that, despite appearances, expert curation in UniProt is scalable. AVAILABILITY AND IMPLEMENTATION: UniProt is freely available at http://www.uniprot.org/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-11-01 2017-07-13 /pmc/articles/PMC5860168/ /pubmed/29036270 http://dx.doi.org/10.1093/bioinformatics/btx439 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Poux, Sylvain
Arighi, Cecilia N
Magrane, Michele
Bateman, Alex
Wei, Chih-Hsuan
Lu, Zhiyong
Boutet, Emmanuel
Bye-A-Jee, Hema
Famiglietti, Maria Livia
Roechert, Bernd
UniProt Consortium, The
On expert curation and scalability: UniProtKB/Swiss-Prot as a case study
title On expert curation and scalability: UniProtKB/Swiss-Prot as a case study
title_full On expert curation and scalability: UniProtKB/Swiss-Prot as a case study
title_fullStr On expert curation and scalability: UniProtKB/Swiss-Prot as a case study
title_full_unstemmed On expert curation and scalability: UniProtKB/Swiss-Prot as a case study
title_short On expert curation and scalability: UniProtKB/Swiss-Prot as a case study
title_sort on expert curation and scalability: uniprotkb/swiss-prot as a case study
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860168/
https://www.ncbi.nlm.nih.gov/pubmed/29036270
http://dx.doi.org/10.1093/bioinformatics/btx439
work_keys_str_mv AT pouxsylvain onexpertcurationandscalabilityuniprotkbswissprotasacasestudy
AT arighicecilian onexpertcurationandscalabilityuniprotkbswissprotasacasestudy
AT magranemichele onexpertcurationandscalabilityuniprotkbswissprotasacasestudy
AT batemanalex onexpertcurationandscalabilityuniprotkbswissprotasacasestudy
AT weichihhsuan onexpertcurationandscalabilityuniprotkbswissprotasacasestudy
AT luzhiyong onexpertcurationandscalabilityuniprotkbswissprotasacasestudy
AT boutetemmanuel onexpertcurationandscalabilityuniprotkbswissprotasacasestudy
AT byeajeehema onexpertcurationandscalabilityuniprotkbswissprotasacasestudy
AT famigliettimarialivia onexpertcurationandscalabilityuniprotkbswissprotasacasestudy
AT roechertbernd onexpertcurationandscalabilityuniprotkbswissprotasacasestudy
AT uniprotconsortiumthe onexpertcurationandscalabilityuniprotkbswissprotasacasestudy