Cargando…

Prioritizing disease genes with an improved dual label propagation framework

BACKGROUND: Prioritizing disease genes is trying to identify potential disease causing genes for a given phenotype, which can be applied to reveal the inherited basis of human diseases and facilitate drug development. Our motivation is inspired by label propagation algorithm and the false positive p...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yaogong, Liu, Jiahui, Liu, Xiaohu, Fan, Xin, Hong, Yuxiang, Wang, Yuan, Huang, YaLou, Xie, MaoQiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5806269/
https://www.ncbi.nlm.nih.gov/pubmed/29422030
http://dx.doi.org/10.1186/s12859-018-2040-6
_version_ 1783299093228945408
author Zhang, Yaogong
Liu, Jiahui
Liu, Xiaohu
Fan, Xin
Hong, Yuxiang
Wang, Yuan
Huang, YaLou
Xie, MaoQiang
author_facet Zhang, Yaogong
Liu, Jiahui
Liu, Xiaohu
Fan, Xin
Hong, Yuxiang
Wang, Yuan
Huang, YaLou
Xie, MaoQiang
author_sort Zhang, Yaogong
collection PubMed
description BACKGROUND: Prioritizing disease genes is trying to identify potential disease causing genes for a given phenotype, which can be applied to reveal the inherited basis of human diseases and facilitate drug development. Our motivation is inspired by label propagation algorithm and the false positive protein-protein interactions that exist in the dataset. To the best of our knowledge, the false positive protein-protein interactions have not been considered before in disease gene prioritization. Label propagation has been successfully applied to prioritize disease causing genes in previous network-based methods. These network-based methods use basic label propagation, i.e. random walk, on networks to prioritize disease genes in different ways. However, all these methods can not deal with the situation in which plenty false positive protein-protein interactions exist in the dataset, because the PPI network is used as a fixed input in previous methods. This important characteristic of data source may cause a large deviation in results. RESULTS: A novel network-based framework IDLP is proposed to prioritize candidate disease genes. IDLP effectively propagates labels throughout the PPI network and the phenotype similarity network. It avoids the method falling when few disease genes are known. Meanwhile, IDLP models the bias caused by false positive protein interactions and other potential factors by treating the PPI network matrix and the phenotype similarity matrix as the matrices to be learnt. By amending the noises in training matrices, it improves the performance results significantly. We conduct extensive experiments over OMIM datasets, and IDLP has demonstrated its effectiveness compared with eight state-of-the-art approaches. The robustness of IDLP is also validated by doing experiments with disturbed PPI network. Furthermore, We search the literatures to verify the predicted new genes got by IDLP are associated with the given diseases, the high prediction accuracy shows IDLP can be a powerful tool to help biologists discover new disease genes. CONCLUSIONS: IDLP model is an effective method for disease gene prioritization, particularly for querying phenotypes without known associated genes, which would be greatly helpful for identifying disease genes for less studied phenotypes. AVAILABILITY: https://github.com/nkiip/IDLP
format Online
Article
Text
id pubmed-5806269
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-58062692018-02-15 Prioritizing disease genes with an improved dual label propagation framework Zhang, Yaogong Liu, Jiahui Liu, Xiaohu Fan, Xin Hong, Yuxiang Wang, Yuan Huang, YaLou Xie, MaoQiang BMC Bioinformatics Methodology Article BACKGROUND: Prioritizing disease genes is trying to identify potential disease causing genes for a given phenotype, which can be applied to reveal the inherited basis of human diseases and facilitate drug development. Our motivation is inspired by label propagation algorithm and the false positive protein-protein interactions that exist in the dataset. To the best of our knowledge, the false positive protein-protein interactions have not been considered before in disease gene prioritization. Label propagation has been successfully applied to prioritize disease causing genes in previous network-based methods. These network-based methods use basic label propagation, i.e. random walk, on networks to prioritize disease genes in different ways. However, all these methods can not deal with the situation in which plenty false positive protein-protein interactions exist in the dataset, because the PPI network is used as a fixed input in previous methods. This important characteristic of data source may cause a large deviation in results. RESULTS: A novel network-based framework IDLP is proposed to prioritize candidate disease genes. IDLP effectively propagates labels throughout the PPI network and the phenotype similarity network. It avoids the method falling when few disease genes are known. Meanwhile, IDLP models the bias caused by false positive protein interactions and other potential factors by treating the PPI network matrix and the phenotype similarity matrix as the matrices to be learnt. By amending the noises in training matrices, it improves the performance results significantly. We conduct extensive experiments over OMIM datasets, and IDLP has demonstrated its effectiveness compared with eight state-of-the-art approaches. The robustness of IDLP is also validated by doing experiments with disturbed PPI network. Furthermore, We search the literatures to verify the predicted new genes got by IDLP are associated with the given diseases, the high prediction accuracy shows IDLP can be a powerful tool to help biologists discover new disease genes. CONCLUSIONS: IDLP model is an effective method for disease gene prioritization, particularly for querying phenotypes without known associated genes, which would be greatly helpful for identifying disease genes for less studied phenotypes. AVAILABILITY: https://github.com/nkiip/IDLP BioMed Central 2018-02-08 /pmc/articles/PMC5806269/ /pubmed/29422030 http://dx.doi.org/10.1186/s12859-018-2040-6 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Zhang, Yaogong
Liu, Jiahui
Liu, Xiaohu
Fan, Xin
Hong, Yuxiang
Wang, Yuan
Huang, YaLou
Xie, MaoQiang
Prioritizing disease genes with an improved dual label propagation framework
title Prioritizing disease genes with an improved dual label propagation framework
title_full Prioritizing disease genes with an improved dual label propagation framework
title_fullStr Prioritizing disease genes with an improved dual label propagation framework
title_full_unstemmed Prioritizing disease genes with an improved dual label propagation framework
title_short Prioritizing disease genes with an improved dual label propagation framework
title_sort prioritizing disease genes with an improved dual label propagation framework
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5806269/
https://www.ncbi.nlm.nih.gov/pubmed/29422030
http://dx.doi.org/10.1186/s12859-018-2040-6
work_keys_str_mv AT zhangyaogong prioritizingdiseasegeneswithanimprovedduallabelpropagationframework
AT liujiahui prioritizingdiseasegeneswithanimprovedduallabelpropagationframework
AT liuxiaohu prioritizingdiseasegeneswithanimprovedduallabelpropagationframework
AT fanxin prioritizingdiseasegeneswithanimprovedduallabelpropagationframework
AT hongyuxiang prioritizingdiseasegeneswithanimprovedduallabelpropagationframework
AT wangyuan prioritizingdiseasegeneswithanimprovedduallabelpropagationframework
AT huangyalou prioritizingdiseasegeneswithanimprovedduallabelpropagationframework
AT xiemaoqiang prioritizingdiseasegeneswithanimprovedduallabelpropagationframework