Cargando…

Integrating node embeddings and biological annotations for genes to predict disease-gene associations

BACKGROUND: Predicting disease causative genes (or simply, disease genes) has played critical roles in understanding the genetic basis of human diseases and further providing disease treatment guidelines. While various computational methods have been proposed for disease gene prediction, with the re...

Descripción completa

Detalles Bibliográficos
Autores principales: Ata, Sezin Kircali, Ou-Yang, Le, Fang, Yuan, Kwoh, Chee-Keong, Wu, Min, Li, Xiao-Li
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311944/
https://www.ncbi.nlm.nih.gov/pubmed/30598097
http://dx.doi.org/10.1186/s12918-018-0662-y
_version_ 1783383707245084672
author Ata, Sezin Kircali
Ou-Yang, Le
Fang, Yuan
Kwoh, Chee-Keong
Wu, Min
Li, Xiao-Li
author_facet Ata, Sezin Kircali
Ou-Yang, Le
Fang, Yuan
Kwoh, Chee-Keong
Wu, Min
Li, Xiao-Li
author_sort Ata, Sezin Kircali
collection PubMed
description BACKGROUND: Predicting disease causative genes (or simply, disease genes) has played critical roles in understanding the genetic basis of human diseases and further providing disease treatment guidelines. While various computational methods have been proposed for disease gene prediction, with the recent increasing availability of biological information for genes, it is highly motivated to leverage these valuable data sources and extract useful information for accurately predicting disease genes. RESULTS: We present an integrative framework called N2VKO to predict disease genes. Firstly, we learn the node embeddings from protein-protein interaction (PPI) network for genes by adapting the well-known representation learning method node2vec. Secondly, we combine the learned node embeddings with various biological annotations as rich feature representation for genes, and subsequently build binary classification models for disease gene prediction. Finally, as the data for disease gene prediction is usually imbalanced (i.e. the number of the causative genes for a specific disease is much less than that of its non-causative genes), we further address this serious data imbalance issue by applying oversampling techniques for imbalance data correction to improve the prediction performance. Comprehensive experiments demonstrate that our proposed N2VKO significantly outperforms four state-of-the-art methods for disease gene prediction across seven diseases. CONCLUSIONS: In this study, we show that node embeddings learned from PPI networks work well for disease gene prediction, while integrating node embeddings with other biological annotations further improves the performance of classification models. Moreover, oversampling techniques for imbalance correction further enhances the prediction performance. In addition, the literature search of predicted disease genes also shows the effectiveness of our proposed N2VKO framework for disease gene prediction.
format Online
Article
Text
id pubmed-6311944
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63119442019-01-07 Integrating node embeddings and biological annotations for genes to predict disease-gene associations Ata, Sezin Kircali Ou-Yang, Le Fang, Yuan Kwoh, Chee-Keong Wu, Min Li, Xiao-Li BMC Syst Biol Research BACKGROUND: Predicting disease causative genes (or simply, disease genes) has played critical roles in understanding the genetic basis of human diseases and further providing disease treatment guidelines. While various computational methods have been proposed for disease gene prediction, with the recent increasing availability of biological information for genes, it is highly motivated to leverage these valuable data sources and extract useful information for accurately predicting disease genes. RESULTS: We present an integrative framework called N2VKO to predict disease genes. Firstly, we learn the node embeddings from protein-protein interaction (PPI) network for genes by adapting the well-known representation learning method node2vec. Secondly, we combine the learned node embeddings with various biological annotations as rich feature representation for genes, and subsequently build binary classification models for disease gene prediction. Finally, as the data for disease gene prediction is usually imbalanced (i.e. the number of the causative genes for a specific disease is much less than that of its non-causative genes), we further address this serious data imbalance issue by applying oversampling techniques for imbalance data correction to improve the prediction performance. Comprehensive experiments demonstrate that our proposed N2VKO significantly outperforms four state-of-the-art methods for disease gene prediction across seven diseases. CONCLUSIONS: In this study, we show that node embeddings learned from PPI networks work well for disease gene prediction, while integrating node embeddings with other biological annotations further improves the performance of classification models. Moreover, oversampling techniques for imbalance correction further enhances the prediction performance. In addition, the literature search of predicted disease genes also shows the effectiveness of our proposed N2VKO framework for disease gene prediction. BioMed Central 2018-12-31 /pmc/articles/PMC6311944/ /pubmed/30598097 http://dx.doi.org/10.1186/s12918-018-0662-y Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Ata, Sezin Kircali
Ou-Yang, Le
Fang, Yuan
Kwoh, Chee-Keong
Wu, Min
Li, Xiao-Li
Integrating node embeddings and biological annotations for genes to predict disease-gene associations
title Integrating node embeddings and biological annotations for genes to predict disease-gene associations
title_full Integrating node embeddings and biological annotations for genes to predict disease-gene associations
title_fullStr Integrating node embeddings and biological annotations for genes to predict disease-gene associations
title_full_unstemmed Integrating node embeddings and biological annotations for genes to predict disease-gene associations
title_short Integrating node embeddings and biological annotations for genes to predict disease-gene associations
title_sort integrating node embeddings and biological annotations for genes to predict disease-gene associations
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6311944/
https://www.ncbi.nlm.nih.gov/pubmed/30598097
http://dx.doi.org/10.1186/s12918-018-0662-y
work_keys_str_mv AT atasezinkircali integratingnodeembeddingsandbiologicalannotationsforgenestopredictdiseasegeneassociations
AT ouyangle integratingnodeembeddingsandbiologicalannotationsforgenestopredictdiseasegeneassociations
AT fangyuan integratingnodeembeddingsandbiologicalannotationsforgenestopredictdiseasegeneassociations
AT kwohcheekeong integratingnodeembeddingsandbiologicalannotationsforgenestopredictdiseasegeneassociations
AT wumin integratingnodeembeddingsandbiologicalannotationsforgenestopredictdiseasegeneassociations
AT lixiaoli integratingnodeembeddingsandbiologicalannotationsforgenestopredictdiseasegeneassociations