Cargando…

Predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ENTS

BACKGROUND: The large-scale identification of physical protein-protein interactions (PPIs) is an important step toward understanding how biological networks evolve and generate emergent phenotypes. However, experimental identification of PPIs is a laborious and error-prone process, and current metho...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rodgers-Melnick, Eli, Culp, Mark, DiFazio, Stephen P
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3848842/ https://www.ncbi.nlm.nih.gov/pubmed/24015873 http://dx.doi.org/10.1186/1471-2164-14-608

_version_	1782293832906309632
author	Rodgers-Melnick, Eli Culp, Mark DiFazio, Stephen P
author_facet	Rodgers-Melnick, Eli Culp, Mark DiFazio, Stephen P
author_sort	Rodgers-Melnick, Eli
collection	PubMed
description	BACKGROUND: The large-scale identification of physical protein-protein interactions (PPIs) is an important step toward understanding how biological networks evolve and generate emergent phenotypes. However, experimental identification of PPIs is a laborious and error-prone process, and current methods of PPI prediction tend to be highly conservative or require large amounts of functional data that may not be available for newly-sequenced organisms. RESULTS: In this study we demonstrate a random-forest based technique, ENTS, for the computational prediction of protein-protein interactions based only on primary sequence data. Our approach is able to efficiently predict interactions on a whole-genome scale for any eukaryotic organism, using pairwise combinations of conserved domains and predicted subcellular localization of proteins as input features. We present the first predicted interactome for the forest tree Populus trichocarpa in addition to the predicted interactomes for Saccharomyces cerevisiae, Homo sapiens, Mus musculus, and Arabidopsis thaliana. Comparing our approach to other PPI predictors, we find that ENTS performs comparably to or better than a number of existing approaches, including several that utilize a variety of functional information for their predictions. We also find that the predicted interactions are biologically meaningful, as indicated by similarity in functional annotations and enrichment of co-expressed genes in public microarray datasets. Furthermore, we demonstrate some of the biological insights that can be gained from these predicted interaction networks. We show that the predicted interactions yield informative groupings of P. trichocarpa metabolic pathways, literature-supported associations among human disease states, and theory-supported insight into the evolutionary dynamics of duplicated genes in paleopolyploid plants. CONCLUSION: We conclude that the ENTS classifier will be a valuable tool for the de novo annotation of genome sequences, providing initial clues about regulatory and metabolic network topology, and revealing relationships that are not immediately obvious from traditional homology-based annotations.
format	Online Article Text
id	pubmed-3848842
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-38488422013-12-06 Predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ENTS Rodgers-Melnick, Eli Culp, Mark DiFazio, Stephen P BMC Genomics Software BACKGROUND: The large-scale identification of physical protein-protein interactions (PPIs) is an important step toward understanding how biological networks evolve and generate emergent phenotypes. However, experimental identification of PPIs is a laborious and error-prone process, and current methods of PPI prediction tend to be highly conservative or require large amounts of functional data that may not be available for newly-sequenced organisms. RESULTS: In this study we demonstrate a random-forest based technique, ENTS, for the computational prediction of protein-protein interactions based only on primary sequence data. Our approach is able to efficiently predict interactions on a whole-genome scale for any eukaryotic organism, using pairwise combinations of conserved domains and predicted subcellular localization of proteins as input features. We present the first predicted interactome for the forest tree Populus trichocarpa in addition to the predicted interactomes for Saccharomyces cerevisiae, Homo sapiens, Mus musculus, and Arabidopsis thaliana. Comparing our approach to other PPI predictors, we find that ENTS performs comparably to or better than a number of existing approaches, including several that utilize a variety of functional information for their predictions. We also find that the predicted interactions are biologically meaningful, as indicated by similarity in functional annotations and enrichment of co-expressed genes in public microarray datasets. Furthermore, we demonstrate some of the biological insights that can be gained from these predicted interaction networks. We show that the predicted interactions yield informative groupings of P. trichocarpa metabolic pathways, literature-supported associations among human disease states, and theory-supported insight into the evolutionary dynamics of duplicated genes in paleopolyploid plants. CONCLUSION: We conclude that the ENTS classifier will be a valuable tool for the de novo annotation of genome sequences, providing initial clues about regulatory and metabolic network topology, and revealing relationships that are not immediately obvious from traditional homology-based annotations. BioMed Central 2013-09-10 /pmc/articles/PMC3848842/ /pubmed/24015873 http://dx.doi.org/10.1186/1471-2164-14-608 Text en Copyright © 2013 Rodgers-Melnick et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Rodgers-Melnick, Eli Culp, Mark DiFazio, Stephen P Predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ENTS
title	Predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ENTS
title_full	Predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ENTS
title_fullStr	Predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ENTS
title_full_unstemmed	Predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ENTS
title_short	Predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ENTS
title_sort	predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ents
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3848842/ https://www.ncbi.nlm.nih.gov/pubmed/24015873 http://dx.doi.org/10.1186/1471-2164-14-608
work_keys_str_mv	AT rodgersmelnickeli predictingwholegenomeproteininteractionnetworksfromprimarysequencedatainmodelandnonmodelorganismsusingents AT culpmark predictingwholegenomeproteininteractionnetworksfromprimarysequencedatainmodelandnonmodelorganismsusingents AT difaziostephenp predictingwholegenomeproteininteractionnetworksfromprimarysequencedatainmodelandnonmodelorganismsusingents

Predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ENTS

Ejemplares similares