Cargando…

Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data

BACKGROUND: High throughput experiments resulted in many genomic datasets and hundreds of candidate disease genes. To discover the real disease genes from a set of candidate genes, computational methods have been proposed and worked on various types of genomic data sources. As a single source of gen...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yongjin, Li, Jinyan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521411/
https://www.ncbi.nlm.nih.gov/pubmed/23282070
http://dx.doi.org/10.1186/1471-2164-13-S7-S27
_version_ 1782252951091281920
author Li, Yongjin
Li, Jinyan
author_facet Li, Yongjin
Li, Jinyan
author_sort Li, Yongjin
collection PubMed
description BACKGROUND: High throughput experiments resulted in many genomic datasets and hundreds of candidate disease genes. To discover the real disease genes from a set of candidate genes, computational methods have been proposed and worked on various types of genomic data sources. As a single source of genomic data is prone of bias, incompleteness and noise, integration of different genomic data sources is highly demanded to accomplish reliable disease gene identification. RESULTS: In contrast to the commonly adapted data integration approach which integrates separate lists of candidate genes derived from the each single data sources, we merge various genomic networks into a multigraph which is capable of connecting multiple edges between a pair of nodes. This novel approach provides a data platform with strong noise tolerance to prioritize the disease genes. A new idea of random walk is then developed to work on multigraphs using a modified step to calculate the transition matrix. Our method is further enhanced to deal with heterogeneous data types by allowing cross-walk between phenotype and gene networks. Compared on benchmark datasets, our method is shown to be more accurate than the state-of-the-art methods in disease gene identification. We also conducted a case study to identify disease genes for Insulin-Dependent Diabetes Mellitus. Some of the newly identified disease genes are supported by recently published literature. CONCLUSIONS: The proposed RWRM (Random Walk with Restart on Multigraphs) model and CHN (Complex Heterogeneous Network) model are effective in data integration for candidate gene prioritization.
format Online
Article
Text
id pubmed-3521411
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35214112012-12-14 Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data Li, Yongjin Li, Jinyan BMC Genomics Proceedings BACKGROUND: High throughput experiments resulted in many genomic datasets and hundreds of candidate disease genes. To discover the real disease genes from a set of candidate genes, computational methods have been proposed and worked on various types of genomic data sources. As a single source of genomic data is prone of bias, incompleteness and noise, integration of different genomic data sources is highly demanded to accomplish reliable disease gene identification. RESULTS: In contrast to the commonly adapted data integration approach which integrates separate lists of candidate genes derived from the each single data sources, we merge various genomic networks into a multigraph which is capable of connecting multiple edges between a pair of nodes. This novel approach provides a data platform with strong noise tolerance to prioritize the disease genes. A new idea of random walk is then developed to work on multigraphs using a modified step to calculate the transition matrix. Our method is further enhanced to deal with heterogeneous data types by allowing cross-walk between phenotype and gene networks. Compared on benchmark datasets, our method is shown to be more accurate than the state-of-the-art methods in disease gene identification. We also conducted a case study to identify disease genes for Insulin-Dependent Diabetes Mellitus. Some of the newly identified disease genes are supported by recently published literature. CONCLUSIONS: The proposed RWRM (Random Walk with Restart on Multigraphs) model and CHN (Complex Heterogeneous Network) model are effective in data integration for candidate gene prioritization. BioMed Central 2012-12-07 /pmc/articles/PMC3521411/ /pubmed/23282070 http://dx.doi.org/10.1186/1471-2164-13-S7-S27 Text en Copyright ©2012 Li et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Li, Yongjin
Li, Jinyan
Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data
title Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data
title_full Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data
title_fullStr Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data
title_full_unstemmed Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data
title_short Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data
title_sort disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521411/
https://www.ncbi.nlm.nih.gov/pubmed/23282070
http://dx.doi.org/10.1186/1471-2164-13-S7-S27
work_keys_str_mv AT liyongjin diseasegeneidentificationbyrandomwalkonmultigraphsmergingheterogeneousgenomicandphenotypedata
AT lijinyan diseasegeneidentificationbyrandomwalkonmultigraphsmergingheterogeneousgenomicandphenotypedata