Cargando…
Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data
BACKGROUND: High throughput experiments resulted in many genomic datasets and hundreds of candidate disease genes. To discover the real disease genes from a set of candidate genes, computational methods have been proposed and worked on various types of genomic data sources. As a single source of gen...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521411/ https://www.ncbi.nlm.nih.gov/pubmed/23282070 http://dx.doi.org/10.1186/1471-2164-13-S7-S27 |
_version_ | 1782252951091281920 |
---|---|
author | Li, Yongjin Li, Jinyan |
author_facet | Li, Yongjin Li, Jinyan |
author_sort | Li, Yongjin |
collection | PubMed |
description | BACKGROUND: High throughput experiments resulted in many genomic datasets and hundreds of candidate disease genes. To discover the real disease genes from a set of candidate genes, computational methods have been proposed and worked on various types of genomic data sources. As a single source of genomic data is prone of bias, incompleteness and noise, integration of different genomic data sources is highly demanded to accomplish reliable disease gene identification. RESULTS: In contrast to the commonly adapted data integration approach which integrates separate lists of candidate genes derived from the each single data sources, we merge various genomic networks into a multigraph which is capable of connecting multiple edges between a pair of nodes. This novel approach provides a data platform with strong noise tolerance to prioritize the disease genes. A new idea of random walk is then developed to work on multigraphs using a modified step to calculate the transition matrix. Our method is further enhanced to deal with heterogeneous data types by allowing cross-walk between phenotype and gene networks. Compared on benchmark datasets, our method is shown to be more accurate than the state-of-the-art methods in disease gene identification. We also conducted a case study to identify disease genes for Insulin-Dependent Diabetes Mellitus. Some of the newly identified disease genes are supported by recently published literature. CONCLUSIONS: The proposed RWRM (Random Walk with Restart on Multigraphs) model and CHN (Complex Heterogeneous Network) model are effective in data integration for candidate gene prioritization. |
format | Online Article Text |
id | pubmed-3521411 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35214112012-12-14 Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data Li, Yongjin Li, Jinyan BMC Genomics Proceedings BACKGROUND: High throughput experiments resulted in many genomic datasets and hundreds of candidate disease genes. To discover the real disease genes from a set of candidate genes, computational methods have been proposed and worked on various types of genomic data sources. As a single source of genomic data is prone of bias, incompleteness and noise, integration of different genomic data sources is highly demanded to accomplish reliable disease gene identification. RESULTS: In contrast to the commonly adapted data integration approach which integrates separate lists of candidate genes derived from the each single data sources, we merge various genomic networks into a multigraph which is capable of connecting multiple edges between a pair of nodes. This novel approach provides a data platform with strong noise tolerance to prioritize the disease genes. A new idea of random walk is then developed to work on multigraphs using a modified step to calculate the transition matrix. Our method is further enhanced to deal with heterogeneous data types by allowing cross-walk between phenotype and gene networks. Compared on benchmark datasets, our method is shown to be more accurate than the state-of-the-art methods in disease gene identification. We also conducted a case study to identify disease genes for Insulin-Dependent Diabetes Mellitus. Some of the newly identified disease genes are supported by recently published literature. CONCLUSIONS: The proposed RWRM (Random Walk with Restart on Multigraphs) model and CHN (Complex Heterogeneous Network) model are effective in data integration for candidate gene prioritization. BioMed Central 2012-12-07 /pmc/articles/PMC3521411/ /pubmed/23282070 http://dx.doi.org/10.1186/1471-2164-13-S7-S27 Text en Copyright ©2012 Li et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Li, Yongjin Li, Jinyan Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data |
title | Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data |
title_full | Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data |
title_fullStr | Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data |
title_full_unstemmed | Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data |
title_short | Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data |
title_sort | disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521411/ https://www.ncbi.nlm.nih.gov/pubmed/23282070 http://dx.doi.org/10.1186/1471-2164-13-S7-S27 |
work_keys_str_mv | AT liyongjin diseasegeneidentificationbyrandomwalkonmultigraphsmergingheterogeneousgenomicandphenotypedata AT lijinyan diseasegeneidentificationbyrandomwalkonmultigraphsmergingheterogeneousgenomicandphenotypedata |