Cargando…

Ensemble disease gene prediction by clinical sample-based networks

BACKGROUND: Disease gene prediction is a critical and challenging task. Many computational methods have been developed to predict disease genes, which can reduce the money and time used in the experimental validation. Since proteins (products of genes) usually work together to achieve a specific fun...

Descripción completa

Detalles Bibliográficos
Autores principales:	Luo, Ping, Tian, Li-Ping, Chen, Bolin, Xiao, Qianghua, Wu, Fang-Xiang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7068856/ https://www.ncbi.nlm.nih.gov/pubmed/32164526 http://dx.doi.org/10.1186/s12859-020-3346-8

_version_	1783505656987254784
author	Luo, Ping Tian, Li-Ping Chen, Bolin Xiao, Qianghua Wu, Fang-Xiang
author_facet	Luo, Ping Tian, Li-Ping Chen, Bolin Xiao, Qianghua Wu, Fang-Xiang
author_sort	Luo, Ping
collection	PubMed
description	BACKGROUND: Disease gene prediction is a critical and challenging task. Many computational methods have been developed to predict disease genes, which can reduce the money and time used in the experimental validation. Since proteins (products of genes) usually work together to achieve a specific function, biomolecular networks, such as the protein-protein interaction (PPI) network and gene co-expression networks, are widely used to predict disease genes by analyzing the relationships between known disease genes and other genes in the networks. However, existing methods commonly use a universal static PPI network, which ignore the fact that PPIs are dynamic, and PPIs in various patients should also be different. RESULTS: To address these issues, we develop an ensemble algorithm to predict disease genes from clinical sample-based networks (EdgCSN). The algorithm first constructs single sample-based networks for each case sample of the disease under study. Then, these single sample-based networks are merged to several fused networks based on the clustering results of the samples. After that, logistic models are trained with centrality features extracted from the fused networks, and an ensemble strategy is used to predict the finial probability of each gene being disease-associated. EdgCSN is evaluated on breast cancer (BC), thyroid cancer (TC) and Alzheimer’s disease (AD) and obtains AUC values of 0.970, 0.971 and 0.966, respectively, which are much better than the competing algorithms. Subsequent de novo validations also demonstrate the ability of EdgCSN in predicting new disease genes. CONCLUSIONS: In this study, we propose EdgCSN, which is an ensemble learning algorithm for predicting disease genes with models trained by centrality features extracted from clinical sample-based networks. Results of the leave-one-out cross validation show that our EdgCSN performs much better than the competing algorithms in predicting BC-associated, TC-associated and AD-associated genes. de novo validations also show that EdgCSN is valuable for identifying new disease genes.
format	Online Article Text
id	pubmed-7068856
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-70688562020-03-18 Ensemble disease gene prediction by clinical sample-based networks Luo, Ping Tian, Li-Ping Chen, Bolin Xiao, Qianghua Wu, Fang-Xiang BMC Bioinformatics Research BACKGROUND: Disease gene prediction is a critical and challenging task. Many computational methods have been developed to predict disease genes, which can reduce the money and time used in the experimental validation. Since proteins (products of genes) usually work together to achieve a specific function, biomolecular networks, such as the protein-protein interaction (PPI) network and gene co-expression networks, are widely used to predict disease genes by analyzing the relationships between known disease genes and other genes in the networks. However, existing methods commonly use a universal static PPI network, which ignore the fact that PPIs are dynamic, and PPIs in various patients should also be different. RESULTS: To address these issues, we develop an ensemble algorithm to predict disease genes from clinical sample-based networks (EdgCSN). The algorithm first constructs single sample-based networks for each case sample of the disease under study. Then, these single sample-based networks are merged to several fused networks based on the clustering results of the samples. After that, logistic models are trained with centrality features extracted from the fused networks, and an ensemble strategy is used to predict the finial probability of each gene being disease-associated. EdgCSN is evaluated on breast cancer (BC), thyroid cancer (TC) and Alzheimer’s disease (AD) and obtains AUC values of 0.970, 0.971 and 0.966, respectively, which are much better than the competing algorithms. Subsequent de novo validations also demonstrate the ability of EdgCSN in predicting new disease genes. CONCLUSIONS: In this study, we propose EdgCSN, which is an ensemble learning algorithm for predicting disease genes with models trained by centrality features extracted from clinical sample-based networks. Results of the leave-one-out cross validation show that our EdgCSN performs much better than the competing algorithms in predicting BC-associated, TC-associated and AD-associated genes. de novo validations also show that EdgCSN is valuable for identifying new disease genes. BioMed Central 2020-03-11 /pmc/articles/PMC7068856/ /pubmed/32164526 http://dx.doi.org/10.1186/s12859-020-3346-8 Text en © The Author(s) 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Luo, Ping Tian, Li-Ping Chen, Bolin Xiao, Qianghua Wu, Fang-Xiang Ensemble disease gene prediction by clinical sample-based networks
title	Ensemble disease gene prediction by clinical sample-based networks
title_full	Ensemble disease gene prediction by clinical sample-based networks
title_fullStr	Ensemble disease gene prediction by clinical sample-based networks
title_full_unstemmed	Ensemble disease gene prediction by clinical sample-based networks
title_short	Ensemble disease gene prediction by clinical sample-based networks
title_sort	ensemble disease gene prediction by clinical sample-based networks
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7068856/ https://www.ncbi.nlm.nih.gov/pubmed/32164526 http://dx.doi.org/10.1186/s12859-020-3346-8
work_keys_str_mv	AT luoping ensemblediseasegenepredictionbyclinicalsamplebasednetworks AT tianliping ensemblediseasegenepredictionbyclinicalsamplebasednetworks AT chenbolin ensemblediseasegenepredictionbyclinicalsamplebasednetworks AT xiaoqianghua ensemblediseasegenepredictionbyclinicalsamplebasednetworks AT wufangxiang ensemblediseasegenepredictionbyclinicalsamplebasednetworks

Ensemble disease gene prediction by clinical sample-based networks

Ejemplares similares