Cargando…

Protein complex prediction via dense subgraphs and false positive analysis

Many proteins work together with others in groups called complexes in order to achieve a specific function. Discovering protein complexes is important for understanding biological processes and predict protein functions in living organisms. Large-scale and throughput techniques have made possible to...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hernandez, Cecilia, Mella, Carlos, Navarro, Gonzalo, Olivera-Nappa, Alvaro, Araya, Jaime
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5609739/ https://www.ncbi.nlm.nih.gov/pubmed/28937982 http://dx.doi.org/10.1371/journal.pone.0183460

_version_	1783265657235701760
author	Hernandez, Cecilia Mella, Carlos Navarro, Gonzalo Olivera-Nappa, Alvaro Araya, Jaime
author_facet	Hernandez, Cecilia Mella, Carlos Navarro, Gonzalo Olivera-Nappa, Alvaro Araya, Jaime
author_sort	Hernandez, Cecilia
collection	PubMed
description	Many proteins work together with others in groups called complexes in order to achieve a specific function. Discovering protein complexes is important for understanding biological processes and predict protein functions in living organisms. Large-scale and throughput techniques have made possible to compile protein-protein interaction networks (PPI networks), which have been used in several computational approaches for detecting protein complexes. Those predictions might guide future biologic experimental research. Some approaches are topology-based, where highly connected proteins are predicted to be complexes; some propose different clustering algorithms using partitioning, overlaps among clusters for networks modeled with unweighted or weighted graphs; and others use density of clusters and information based on protein functionality. However, some schemes still require much processing time or the quality of their results can be improved. Furthermore, most of the results obtained with computational tools are not accompanied by an analysis of false positives. We propose an effective and efficient mining algorithm for discovering highly connected subgraphs, which is our base for defining protein complexes. Our representation is based on transforming the PPI network into a directed acyclic graph that reduces the number of represented edges and the search space for discovering subgraphs. Our approach considers weighted and unweighted PPI networks. We compare our best alternative using PPI networks from Saccharomyces cerevisiae (yeast) and Homo sapiens (human) with state-of-the-art approaches in terms of clustering, biological metrics and execution times, as well as three gold standards for yeast and two for human. Furthermore, we analyze false positive predicted complexes searching the PDBe (Protein Data Bank in Europe) database in order to identify matching protein complexes that have been purified and structurally characterized. Our analysis shows that more than 50 yeast protein complexes and more than 300 human protein complexes found to be false positives according to our prediction method, i.e., not described in the gold standard complex databases, in fact contain protein complexes that have been characterized structurally and documented in PDBe. We also found that some of these protein complexes have recently been classified as part of a Periodic Table of Protein Complexes. The latest version of our software is publicly available at http://doi.org/10.6084/m9.figshare.5297314.v1.
format	Online Article Text
id	pubmed-5609739
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-56097392017-10-09 Protein complex prediction via dense subgraphs and false positive analysis Hernandez, Cecilia Mella, Carlos Navarro, Gonzalo Olivera-Nappa, Alvaro Araya, Jaime PLoS One Research Article Many proteins work together with others in groups called complexes in order to achieve a specific function. Discovering protein complexes is important for understanding biological processes and predict protein functions in living organisms. Large-scale and throughput techniques have made possible to compile protein-protein interaction networks (PPI networks), which have been used in several computational approaches for detecting protein complexes. Those predictions might guide future biologic experimental research. Some approaches are topology-based, where highly connected proteins are predicted to be complexes; some propose different clustering algorithms using partitioning, overlaps among clusters for networks modeled with unweighted or weighted graphs; and others use density of clusters and information based on protein functionality. However, some schemes still require much processing time or the quality of their results can be improved. Furthermore, most of the results obtained with computational tools are not accompanied by an analysis of false positives. We propose an effective and efficient mining algorithm for discovering highly connected subgraphs, which is our base for defining protein complexes. Our representation is based on transforming the PPI network into a directed acyclic graph that reduces the number of represented edges and the search space for discovering subgraphs. Our approach considers weighted and unweighted PPI networks. We compare our best alternative using PPI networks from Saccharomyces cerevisiae (yeast) and Homo sapiens (human) with state-of-the-art approaches in terms of clustering, biological metrics and execution times, as well as three gold standards for yeast and two for human. Furthermore, we analyze false positive predicted complexes searching the PDBe (Protein Data Bank in Europe) database in order to identify matching protein complexes that have been purified and structurally characterized. Our analysis shows that more than 50 yeast protein complexes and more than 300 human protein complexes found to be false positives according to our prediction method, i.e., not described in the gold standard complex databases, in fact contain protein complexes that have been characterized structurally and documented in PDBe. We also found that some of these protein complexes have recently been classified as part of a Periodic Table of Protein Complexes. The latest version of our software is publicly available at http://doi.org/10.6084/m9.figshare.5297314.v1. Public Library of Science 2017-09-22 /pmc/articles/PMC5609739/ /pubmed/28937982 http://dx.doi.org/10.1371/journal.pone.0183460 Text en © 2017 Hernandez et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Hernandez, Cecilia Mella, Carlos Navarro, Gonzalo Olivera-Nappa, Alvaro Araya, Jaime Protein complex prediction via dense subgraphs and false positive analysis
title	Protein complex prediction via dense subgraphs and false positive analysis
title_full	Protein complex prediction via dense subgraphs and false positive analysis
title_fullStr	Protein complex prediction via dense subgraphs and false positive analysis
title_full_unstemmed	Protein complex prediction via dense subgraphs and false positive analysis
title_short	Protein complex prediction via dense subgraphs and false positive analysis
title_sort	protein complex prediction via dense subgraphs and false positive analysis
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5609739/ https://www.ncbi.nlm.nih.gov/pubmed/28937982 http://dx.doi.org/10.1371/journal.pone.0183460
work_keys_str_mv	AT hernandezcecilia proteincomplexpredictionviadensesubgraphsandfalsepositiveanalysis AT mellacarlos proteincomplexpredictionviadensesubgraphsandfalsepositiveanalysis AT navarrogonzalo proteincomplexpredictionviadensesubgraphsandfalsepositiveanalysis AT oliveranappaalvaro proteincomplexpredictionviadensesubgraphsandfalsepositiveanalysis AT arayajaime proteincomplexpredictionviadensesubgraphsandfalsepositiveanalysis

Protein complex prediction via dense subgraphs and false positive analysis

Ejemplares similares