Cargando…

Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion

Complex diseases seriously affect people's physical and mental health. The discovery of disease-causing genes has become a target of research. With the emergence of bioinformatics and the rapid development of biotechnology, to overcome the inherent difficulties of the long experimental period a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Chunyu, Zhang, Jie, Wang, Xueping, Han, Ke, Guo, Maozu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2020
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7010852/ https://www.ncbi.nlm.nih.gov/pubmed/32117433 http://dx.doi.org/10.3389/fgene.2020.00005

_version_	1783495957772500992
author	Wang, Chunyu Zhang, Jie Wang, Xueping Han, Ke Guo, Maozu
author_facet	Wang, Chunyu Zhang, Jie Wang, Xueping Han, Ke Guo, Maozu
author_sort	Wang, Chunyu
collection	PubMed
description	Complex diseases seriously affect people's physical and mental health. The discovery of disease-causing genes has become a target of research. With the emergence of bioinformatics and the rapid development of biotechnology, to overcome the inherent difficulties of the long experimental period and high cost of traditional biomedical methods, researchers have proposed many gene prioritization algorithms that use a large amount of biological data to mine pathogenic genes. However, because the currently known gene–disease association matrix is still very sparse and lacks evidence that genes and diseases are unrelated, there are limits to the predictive performance of gene prioritization algorithms. Based on the hypothesis that functionally related gene mutations may lead to similar disease phenotypes, this paper proposes a PU induction matrix completion algorithm based on heterogeneous information fusion (PUIMCHIF) to predict candidate genes involved in the pathogenicity of human diseases. On the one hand, PUIMCHIF uses different compact feature learning methods to extract features of genes and diseases from multiple data sources, making up for the lack of sparse data. On the other hand, based on the prior knowledge that most of the unknown gene–disease associations are unrelated, we use the PU-Learning strategy to treat the unknown unlabeled data as negative examples for biased learning. The experimental results of the PUIMCHIF algorithm regarding the three indexes of precision, recall, and mean percentile ranking (MPR) were significantly better than those of other algorithms. In the top 100 global prediction analysis of multiple genes and multiple diseases, the probability of recovering true gene associations using PUIMCHIF reached 50% and the MPR value was 10.94%. The PUIMCHIF algorithm has higher priority than those from other methods, such as IMC and CATAPULT.
format	Online Article Text
id	pubmed-7010852
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-70108522020-02-28 Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion Wang, Chunyu Zhang, Jie Wang, Xueping Han, Ke Guo, Maozu Front Genet Genetics Complex diseases seriously affect people's physical and mental health. The discovery of disease-causing genes has become a target of research. With the emergence of bioinformatics and the rapid development of biotechnology, to overcome the inherent difficulties of the long experimental period and high cost of traditional biomedical methods, researchers have proposed many gene prioritization algorithms that use a large amount of biological data to mine pathogenic genes. However, because the currently known gene–disease association matrix is still very sparse and lacks evidence that genes and diseases are unrelated, there are limits to the predictive performance of gene prioritization algorithms. Based on the hypothesis that functionally related gene mutations may lead to similar disease phenotypes, this paper proposes a PU induction matrix completion algorithm based on heterogeneous information fusion (PUIMCHIF) to predict candidate genes involved in the pathogenicity of human diseases. On the one hand, PUIMCHIF uses different compact feature learning methods to extract features of genes and diseases from multiple data sources, making up for the lack of sparse data. On the other hand, based on the prior knowledge that most of the unknown gene–disease associations are unrelated, we use the PU-Learning strategy to treat the unknown unlabeled data as negative examples for biased learning. The experimental results of the PUIMCHIF algorithm regarding the three indexes of precision, recall, and mean percentile ranking (MPR) were significantly better than those of other algorithms. In the top 100 global prediction analysis of multiple genes and multiple diseases, the probability of recovering true gene associations using PUIMCHIF reached 50% and the MPR value was 10.94%. The PUIMCHIF algorithm has higher priority than those from other methods, such as IMC and CATAPULT. Frontiers Media S.A. 2020-02-04 /pmc/articles/PMC7010852/ /pubmed/32117433 http://dx.doi.org/10.3389/fgene.2020.00005 Text en Copyright © 2020 Wang, Zhang, Wang, Han and Guo http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Wang, Chunyu Zhang, Jie Wang, Xueping Han, Ke Guo, Maozu Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion
title	Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion
title_full	Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion
title_fullStr	Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion
title_full_unstemmed	Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion
title_short	Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion
title_sort	pathogenic gene prediction algorithm based on heterogeneous information fusion
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7010852/ https://www.ncbi.nlm.nih.gov/pubmed/32117433 http://dx.doi.org/10.3389/fgene.2020.00005
work_keys_str_mv	AT wangchunyu pathogenicgenepredictionalgorithmbasedonheterogeneousinformationfusion AT zhangjie pathogenicgenepredictionalgorithmbasedonheterogeneousinformationfusion AT wangxueping pathogenicgenepredictionalgorithmbasedonheterogeneousinformationfusion AT hanke pathogenicgenepredictionalgorithmbasedonheterogeneousinformationfusion AT guomaozu pathogenicgenepredictionalgorithmbasedonheterogeneousinformationfusion

Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion

Ejemplares similares