Cargando…

Network-based prediction and knowledge mining of disease genes

BACKGROUND: In recent years, high-throughput protein interaction identification methods have generated a large amount of data. When combined with the results from other in vivo and in vitro experiments, a complex set of relationships between biological molecules emerges. The growing popularity of ne...

Descripción completa

Detalles Bibliográficos
Autores principales:	Carson, Matthew B, Lu, Hui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460923/ https://www.ncbi.nlm.nih.gov/pubmed/26043920 http://dx.doi.org/10.1186/1755-8794-8-S2-S9

_version_	1782375463386087424
author	Carson, Matthew B Lu, Hui
author_facet	Carson, Matthew B Lu, Hui
author_sort	Carson, Matthew B
collection	PubMed
description	BACKGROUND: In recent years, high-throughput protein interaction identification methods have generated a large amount of data. When combined with the results from other in vivo and in vitro experiments, a complex set of relationships between biological molecules emerges. The growing popularity of network analysis and data mining has allowed researchers to recognize indirect connections between these molecules. Due to the interdependent nature of network entities, evaluating proteins in this context can reveal relationships that may not otherwise be evident. METHODS: We examined the human protein interaction network as it relates to human illness using the Disease Ontology. After calculating several topological metrics, we trained an alternating decision tree (ADTree) classifier to identify disease-associated proteins. Using a bootstrapping method, we created a tree to highlight conserved characteristics shared by many of these proteins. Subsequently, we reviewed a set of non-disease-associated proteins that were misclassified by the algorithm with high confidence and searched for evidence of a disease relationship. RESULTS: Our classifier was able to predict disease-related genes with 79% area under the receiver operating characteristic (ROC) curve (AUC), which indicates the tradeoff between sensitivity and specificity and is a good predictor of how a classifier will perform on future data sets. We found that a combination of several network characteristics including degree centrality, disease neighbor ratio, eccentricity, and neighborhood connectivity help to distinguish between disease- and non-disease-related proteins. Furthermore, the ADTree allowed us to understand which combinations of strongly predictive attributes contributed most to protein-disease classification. In our post-processing evaluation, we found several examples of potential novel disease-related proteins and corresponding literature evidence. In addition, we showed that first- and second-order neighbors in the PPI network could be used to identify likely disease associations. CONCLUSIONS: We analyzed the human protein interaction network and its relationship to disease and found that both the number of interactions with other proteins and the disease relationship of neighboring proteins helped to determine whether a protein had a relationship to disease. Our classifier predicted many proteins with no annotated disease association to be disease-related, which indicated that these proteins have network characteristics that are similar to disease-related proteins and may therefore have disease associations not previously identified. By performing a post-processing step after the prediction, we were able to identify evidence in literature supporting this possibility. This method could provide a useful filter for experimentalists searching for new candidate protein targets for drug repositioning and could also be extended to include other network and data types in order to refine these predictions.
format	Online Article Text
id	pubmed-4460923
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-44609232015-06-29 Network-based prediction and knowledge mining of disease genes Carson, Matthew B Lu, Hui BMC Med Genomics Research BACKGROUND: In recent years, high-throughput protein interaction identification methods have generated a large amount of data. When combined with the results from other in vivo and in vitro experiments, a complex set of relationships between biological molecules emerges. The growing popularity of network analysis and data mining has allowed researchers to recognize indirect connections between these molecules. Due to the interdependent nature of network entities, evaluating proteins in this context can reveal relationships that may not otherwise be evident. METHODS: We examined the human protein interaction network as it relates to human illness using the Disease Ontology. After calculating several topological metrics, we trained an alternating decision tree (ADTree) classifier to identify disease-associated proteins. Using a bootstrapping method, we created a tree to highlight conserved characteristics shared by many of these proteins. Subsequently, we reviewed a set of non-disease-associated proteins that were misclassified by the algorithm with high confidence and searched for evidence of a disease relationship. RESULTS: Our classifier was able to predict disease-related genes with 79% area under the receiver operating characteristic (ROC) curve (AUC), which indicates the tradeoff between sensitivity and specificity and is a good predictor of how a classifier will perform on future data sets. We found that a combination of several network characteristics including degree centrality, disease neighbor ratio, eccentricity, and neighborhood connectivity help to distinguish between disease- and non-disease-related proteins. Furthermore, the ADTree allowed us to understand which combinations of strongly predictive attributes contributed most to protein-disease classification. In our post-processing evaluation, we found several examples of potential novel disease-related proteins and corresponding literature evidence. In addition, we showed that first- and second-order neighbors in the PPI network could be used to identify likely disease associations. CONCLUSIONS: We analyzed the human protein interaction network and its relationship to disease and found that both the number of interactions with other proteins and the disease relationship of neighboring proteins helped to determine whether a protein had a relationship to disease. Our classifier predicted many proteins with no annotated disease association to be disease-related, which indicated that these proteins have network characteristics that are similar to disease-related proteins and may therefore have disease associations not previously identified. By performing a post-processing step after the prediction, we were able to identify evidence in literature supporting this possibility. This method could provide a useful filter for experimentalists searching for new candidate protein targets for drug repositioning and could also be extended to include other network and data types in order to refine these predictions. BioMed Central 2015-05-29 /pmc/articles/PMC4460923/ /pubmed/26043920 http://dx.doi.org/10.1186/1755-8794-8-S2-S9 Text en Copyright © 2015 Carson and Lu; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Carson, Matthew B Lu, Hui Network-based prediction and knowledge mining of disease genes
title	Network-based prediction and knowledge mining of disease genes
title_full	Network-based prediction and knowledge mining of disease genes
title_fullStr	Network-based prediction and knowledge mining of disease genes
title_full_unstemmed	Network-based prediction and knowledge mining of disease genes
title_short	Network-based prediction and knowledge mining of disease genes
title_sort	network-based prediction and knowledge mining of disease genes
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460923/ https://www.ncbi.nlm.nih.gov/pubmed/26043920 http://dx.doi.org/10.1186/1755-8794-8-S2-S9
work_keys_str_mv	AT carsonmatthewb networkbasedpredictionandknowledgeminingofdiseasegenes AT luhui networkbasedpredictionandknowledgeminingofdiseasegenes

Network-based prediction and knowledge mining of disease genes

Ejemplares similares