Cargando…

A network embedding model for pathogenic genes prediction by multi-path random walking on heterogeneous network

BACKGROUND: Prediction of pathogenic genes is crucial for disease prevention, diagnosis, and treatment. But traditional genetic localization methods are often technique-difficulty and time-consuming. With the development of computer science, computational biology has gradually become one of the main...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Bo, Liu, Yu, Yu, Shuo, Wang, Lei, Dong, Jie, Lin, Hongfei, Yang, Zhihao, Wang, Jian, Xia, Feng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6927107/
https://www.ncbi.nlm.nih.gov/pubmed/31865919
http://dx.doi.org/10.1186/s12920-019-0627-z
_version_ 1783482240938803200
author Xu, Bo
Liu, Yu
Yu, Shuo
Wang, Lei
Dong, Jie
Lin, Hongfei
Yang, Zhihao
Wang, Jian
Xia, Feng
author_facet Xu, Bo
Liu, Yu
Yu, Shuo
Wang, Lei
Dong, Jie
Lin, Hongfei
Yang, Zhihao
Wang, Jian
Xia, Feng
author_sort Xu, Bo
collection PubMed
description BACKGROUND: Prediction of pathogenic genes is crucial for disease prevention, diagnosis, and treatment. But traditional genetic localization methods are often technique-difficulty and time-consuming. With the development of computer science, computational biology has gradually become one of the main methods for finding candidate pathogenic genes. METHODS: We propose a pathogenic genes prediction method based on network embedding which is called Multipath2vec. Firstly, we construct an heterogeneous network which is called GP−network. It is constructed based on three kinds of relationships between genes and phenotypes, including correlations between phenotypes, interactions between genes and known gene-phenotype pairs. Then in order to embedding the network better, we design the multi-path to guide random walk in GP−network. The multi-path includes multiple paths between genes and phenotypes which can capture complex structural information of heterogeneous network. Finally, we use the learned vector representation of each phenotype and protein to calculate the similarities and rank according to the similarities between candidate genes and the target phenotype. RESULTS: We implemented Multipath2vec and four baseline approaches (i.e., CATAPULT, PRINCE, Deepwalk and Metapath2vec) on many-genes gene-phenotype data, single-gene gene-phenotype data and whole gene-phenotype data. Experimental results show that Multipath2vec outperformed the state-of-the-art baselines in pathogenic genes prediction task. CONCLUSIONS: We propose Multipath2vec that can be utilized to predict pathogenic genes and experimental results show the higher accuracy of pathogenic genes prediction.
format Online
Article
Text
id pubmed-6927107
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69271072019-12-30 A network embedding model for pathogenic genes prediction by multi-path random walking on heterogeneous network Xu, Bo Liu, Yu Yu, Shuo Wang, Lei Dong, Jie Lin, Hongfei Yang, Zhihao Wang, Jian Xia, Feng BMC Med Genomics Research BACKGROUND: Prediction of pathogenic genes is crucial for disease prevention, diagnosis, and treatment. But traditional genetic localization methods are often technique-difficulty and time-consuming. With the development of computer science, computational biology has gradually become one of the main methods for finding candidate pathogenic genes. METHODS: We propose a pathogenic genes prediction method based on network embedding which is called Multipath2vec. Firstly, we construct an heterogeneous network which is called GP−network. It is constructed based on three kinds of relationships between genes and phenotypes, including correlations between phenotypes, interactions between genes and known gene-phenotype pairs. Then in order to embedding the network better, we design the multi-path to guide random walk in GP−network. The multi-path includes multiple paths between genes and phenotypes which can capture complex structural information of heterogeneous network. Finally, we use the learned vector representation of each phenotype and protein to calculate the similarities and rank according to the similarities between candidate genes and the target phenotype. RESULTS: We implemented Multipath2vec and four baseline approaches (i.e., CATAPULT, PRINCE, Deepwalk and Metapath2vec) on many-genes gene-phenotype data, single-gene gene-phenotype data and whole gene-phenotype data. Experimental results show that Multipath2vec outperformed the state-of-the-art baselines in pathogenic genes prediction task. CONCLUSIONS: We propose Multipath2vec that can be utilized to predict pathogenic genes and experimental results show the higher accuracy of pathogenic genes prediction. BioMed Central 2019-12-23 /pmc/articles/PMC6927107/ /pubmed/31865919 http://dx.doi.org/10.1186/s12920-019-0627-z Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Xu, Bo
Liu, Yu
Yu, Shuo
Wang, Lei
Dong, Jie
Lin, Hongfei
Yang, Zhihao
Wang, Jian
Xia, Feng
A network embedding model for pathogenic genes prediction by multi-path random walking on heterogeneous network
title A network embedding model for pathogenic genes prediction by multi-path random walking on heterogeneous network
title_full A network embedding model for pathogenic genes prediction by multi-path random walking on heterogeneous network
title_fullStr A network embedding model for pathogenic genes prediction by multi-path random walking on heterogeneous network
title_full_unstemmed A network embedding model for pathogenic genes prediction by multi-path random walking on heterogeneous network
title_short A network embedding model for pathogenic genes prediction by multi-path random walking on heterogeneous network
title_sort network embedding model for pathogenic genes prediction by multi-path random walking on heterogeneous network
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6927107/
https://www.ncbi.nlm.nih.gov/pubmed/31865919
http://dx.doi.org/10.1186/s12920-019-0627-z
work_keys_str_mv AT xubo anetworkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork
AT liuyu anetworkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork
AT yushuo anetworkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork
AT wanglei anetworkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork
AT dongjie anetworkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork
AT linhongfei anetworkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork
AT yangzhihao anetworkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork
AT wangjian anetworkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork
AT xiafeng anetworkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork
AT xubo networkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork
AT liuyu networkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork
AT yushuo networkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork
AT wanglei networkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork
AT dongjie networkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork
AT linhongfei networkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork
AT yangzhihao networkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork
AT wangjian networkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork
AT xiafeng networkembeddingmodelforpathogenicgenespredictionbymultipathrandomwalkingonheterogeneousnetwork