Cargando…

Literature curation of protein interactions: measuring agreement across major public databases

Literature curation of protein interaction data faces a number of challenges. Although curators increasingly adhere to standard data representations, the data that various databases actually record from the same published information may differ significantly. Some of the reasons underlying these dif...

Descripción completa

Detalles Bibliográficos
Autores principales: Turinsky, Andrei L., Razick, Sabry, Turner, Brian, Donaldson, Ian M., Wodak, Shoshana J.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3011985/
https://www.ncbi.nlm.nih.gov/pubmed/21183497
http://dx.doi.org/10.1093/database/baq026
_version_ 1782195054572470272
author Turinsky, Andrei L.
Razick, Sabry
Turner, Brian
Donaldson, Ian M.
Wodak, Shoshana J.
author_facet Turinsky, Andrei L.
Razick, Sabry
Turner, Brian
Donaldson, Ian M.
Wodak, Shoshana J.
author_sort Turinsky, Andrei L.
collection PubMed
description Literature curation of protein interaction data faces a number of challenges. Although curators increasingly adhere to standard data representations, the data that various databases actually record from the same published information may differ significantly. Some of the reasons underlying these differences are well known, but their global impact on the interactions collectively curated by major public databases has not been evaluated. Here we quantify the agreement between curated interactions from 15 471 publications shared across nine major public databases. Results show that on average, two databases fully agree on 42% of the interactions and 62% of the proteins curated from the same publication. Furthermore, a sizable fraction of the measured differences can be attributed to divergent assignments of organism or splice isoforms, different organism focus and alternative representations of multi-protein complexes. Our findings highlight the impact of divergent curation policies across databases, and should be relevant to both curators and data consumers interested in analyzing protein-interaction data generated by the scientific community. Database URL: http://wodaklab.org/iRefWeb
format Text
id pubmed-3011985
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-30119852010-12-29 Literature curation of protein interactions: measuring agreement across major public databases Turinsky, Andrei L. Razick, Sabry Turner, Brian Donaldson, Ian M. Wodak, Shoshana J. Database (Oxford) Original Article Literature curation of protein interaction data faces a number of challenges. Although curators increasingly adhere to standard data representations, the data that various databases actually record from the same published information may differ significantly. Some of the reasons underlying these differences are well known, but their global impact on the interactions collectively curated by major public databases has not been evaluated. Here we quantify the agreement between curated interactions from 15 471 publications shared across nine major public databases. Results show that on average, two databases fully agree on 42% of the interactions and 62% of the proteins curated from the same publication. Furthermore, a sizable fraction of the measured differences can be attributed to divergent assignments of organism or splice isoforms, different organism focus and alternative representations of multi-protein complexes. Our findings highlight the impact of divergent curation policies across databases, and should be relevant to both curators and data consumers interested in analyzing protein-interaction data generated by the scientific community. Database URL: http://wodaklab.org/iRefWeb Oxford University Press 2010-12-22 /pmc/articles/PMC3011985/ /pubmed/21183497 http://dx.doi.org/10.1093/database/baq026 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Turinsky, Andrei L.
Razick, Sabry
Turner, Brian
Donaldson, Ian M.
Wodak, Shoshana J.
Literature curation of protein interactions: measuring agreement across major public databases
title Literature curation of protein interactions: measuring agreement across major public databases
title_full Literature curation of protein interactions: measuring agreement across major public databases
title_fullStr Literature curation of protein interactions: measuring agreement across major public databases
title_full_unstemmed Literature curation of protein interactions: measuring agreement across major public databases
title_short Literature curation of protein interactions: measuring agreement across major public databases
title_sort literature curation of protein interactions: measuring agreement across major public databases
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3011985/
https://www.ncbi.nlm.nih.gov/pubmed/21183497
http://dx.doi.org/10.1093/database/baq026
work_keys_str_mv AT turinskyandreil literaturecurationofproteininteractionsmeasuringagreementacrossmajorpublicdatabases
AT razicksabry literaturecurationofproteininteractionsmeasuringagreementacrossmajorpublicdatabases
AT turnerbrian literaturecurationofproteininteractionsmeasuringagreementacrossmajorpublicdatabases
AT donaldsonianm literaturecurationofproteininteractionsmeasuringagreementacrossmajorpublicdatabases
AT wodakshoshanaj literaturecurationofproteininteractionsmeasuringagreementacrossmajorpublicdatabases