Cargando…

Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome

BACKGROUND: Extensive protein interaction maps are being constructed for yeast, worm, and fly to ask how the proteins organize into pathways and systems, but no such genome-wide interaction map yet exists for the set of human proteins. To prepare for studies in humans, we wished to establish tests f...

Descripción completa

Detalles Bibliográficos
Autores principales: Ramani, Arun K, Bunescu, Razvan C, Mooney, Raymond J, Marcotte, Edward M
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1175952/
https://www.ncbi.nlm.nih.gov/pubmed/15892868
http://dx.doi.org/10.1186/gb-2005-6-5-r40
_version_ 1782124548228907008
author Ramani, Arun K
Bunescu, Razvan C
Mooney, Raymond J
Marcotte, Edward M
author_facet Ramani, Arun K
Bunescu, Razvan C
Mooney, Raymond J
Marcotte, Edward M
author_sort Ramani, Arun K
collection PubMed
description BACKGROUND: Extensive protein interaction maps are being constructed for yeast, worm, and fly to ask how the proteins organize into pathways and systems, but no such genome-wide interaction map yet exists for the set of human proteins. To prepare for studies in humans, we wished to establish tests for the accuracy of future interaction assays and to consolidate the known interactions among human proteins. RESULTS: We established two tests of the accuracy of human protein interaction datasets and measured the relative accuracy of the available data. We then developed and applied natural language processing and literature-mining algorithms to recover from Medline abstracts 6,580 interactions among 3,737 human proteins. A three-part algorithm was used: first, human protein names were identified in Medline abstracts using a discriminator based on conditional random fields, then interactions were identified by the co-occurrence of protein names across the set of Medline abstracts, filtering the interactions with a Bayesian classifier to enrich for legitimate physical interactions. These mined interactions were combined with existing interaction data to obtain a network of 31,609 interactions among 7,748 human proteins, accurate to the same degree as the existing datasets. CONCLUSION: These interactions and the accuracy benchmarks will aid interpretation of current functional genomics data and provide a basis for determining the quality of future large-scale human protein interaction assays. Projecting from the approximately 15 interactions per protein in the best-sampled interaction set to the estimated 25,000 human genes implies more than 375,000 interactions in the complete human protein interaction network. This set therefore represents no more than 10% of the complete network.
format Text
id pubmed-1175952
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-11759522005-07-17 Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome Ramani, Arun K Bunescu, Razvan C Mooney, Raymond J Marcotte, Edward M Genome Biol Research BACKGROUND: Extensive protein interaction maps are being constructed for yeast, worm, and fly to ask how the proteins organize into pathways and systems, but no such genome-wide interaction map yet exists for the set of human proteins. To prepare for studies in humans, we wished to establish tests for the accuracy of future interaction assays and to consolidate the known interactions among human proteins. RESULTS: We established two tests of the accuracy of human protein interaction datasets and measured the relative accuracy of the available data. We then developed and applied natural language processing and literature-mining algorithms to recover from Medline abstracts 6,580 interactions among 3,737 human proteins. A three-part algorithm was used: first, human protein names were identified in Medline abstracts using a discriminator based on conditional random fields, then interactions were identified by the co-occurrence of protein names across the set of Medline abstracts, filtering the interactions with a Bayesian classifier to enrich for legitimate physical interactions. These mined interactions were combined with existing interaction data to obtain a network of 31,609 interactions among 7,748 human proteins, accurate to the same degree as the existing datasets. CONCLUSION: These interactions and the accuracy benchmarks will aid interpretation of current functional genomics data and provide a basis for determining the quality of future large-scale human protein interaction assays. Projecting from the approximately 15 interactions per protein in the best-sampled interaction set to the estimated 25,000 human genes implies more than 375,000 interactions in the complete human protein interaction network. This set therefore represents no more than 10% of the complete network. BioMed Central 2005 2005-04-15 /pmc/articles/PMC1175952/ /pubmed/15892868 http://dx.doi.org/10.1186/gb-2005-6-5-r40 Text en Copyright © 2005 Marcotte et al.; licensee BioMed Central Ltd.
spellingShingle Research
Ramani, Arun K
Bunescu, Razvan C
Mooney, Raymond J
Marcotte, Edward M
Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome
title Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome
title_full Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome
title_fullStr Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome
title_full_unstemmed Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome
title_short Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome
title_sort consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1175952/
https://www.ncbi.nlm.nih.gov/pubmed/15892868
http://dx.doi.org/10.1186/gb-2005-6-5-r40
work_keys_str_mv AT ramaniarunk consolidatingthesetofknownhumanproteinproteininteractionsinpreparationforlargescalemappingofthehumaninteractome
AT bunescurazvanc consolidatingthesetofknownhumanproteinproteininteractionsinpreparationforlargescalemappingofthehumaninteractome
AT mooneyraymondj consolidatingthesetofknownhumanproteinproteininteractionsinpreparationforlargescalemappingofthehumaninteractome
AT marcotteedwardm consolidatingthesetofknownhumanproteinproteininteractionsinpreparationforlargescalemappingofthehumaninteractome