Cargando…

In Silico Gene Prioritization by Integrating Multiple Data Sources

Identifying disease genes is crucial to the understanding of disease pathogenesis, and to the improvement of disease diagnosis and treatment. In recent years, many researchers have proposed approaches to prioritize candidate genes by considering the relationship of candidate genes and existing known...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Yixuan, Wang, Wenhui, Zhou, Yingyao, Shields, Robert, Chanda, Sumit K., Elston, Robert C., Li, Jing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3123338/
https://www.ncbi.nlm.nih.gov/pubmed/21731658
http://dx.doi.org/10.1371/journal.pone.0021137
_version_ 1782206972597108736
author Chen, Yixuan
Wang, Wenhui
Zhou, Yingyao
Shields, Robert
Chanda, Sumit K.
Elston, Robert C.
Li, Jing
author_facet Chen, Yixuan
Wang, Wenhui
Zhou, Yingyao
Shields, Robert
Chanda, Sumit K.
Elston, Robert C.
Li, Jing
author_sort Chen, Yixuan
collection PubMed
description Identifying disease genes is crucial to the understanding of disease pathogenesis, and to the improvement of disease diagnosis and treatment. In recent years, many researchers have proposed approaches to prioritize candidate genes by considering the relationship of candidate genes and existing known disease genes, reflected in other data sources. In this paper, we propose an expandable framework for gene prioritization that can integrate multiple heterogeneous data sources by taking advantage of a unified graphic representation. Gene-gene relationships and gene-disease relationships are then defined based on the overall topology of each network using a diffusion kernel measure. These relationship measures are in turn normalized to derive an overall measure across all networks, which is utilized to rank all candidate genes. Based on the informativeness of available data sources with respect to each specific disease, we also propose an adaptive threshold score to select a small subset of candidate genes for further validation studies. We performed large scale cross-validation analysis on 110 disease families using three data sources. Results have shown that our approach consistently outperforms other two state of the art programs. A case study using Parkinson disease (PD) has identified four candidate genes (UBB, SEPT5, GPR37 and TH) that ranked higher than our adaptive threshold, all of which are involved in the PD pathway. In particular, a very recent study has observed a deletion of TH in a patient with PD, which supports the importance of the TH gene in PD pathogenesis. A web tool has been implemented to assist scientists in their genetic studies.
format Online
Article
Text
id pubmed-3123338
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-31233382011-06-30 In Silico Gene Prioritization by Integrating Multiple Data Sources Chen, Yixuan Wang, Wenhui Zhou, Yingyao Shields, Robert Chanda, Sumit K. Elston, Robert C. Li, Jing PLoS One Research Article Identifying disease genes is crucial to the understanding of disease pathogenesis, and to the improvement of disease diagnosis and treatment. In recent years, many researchers have proposed approaches to prioritize candidate genes by considering the relationship of candidate genes and existing known disease genes, reflected in other data sources. In this paper, we propose an expandable framework for gene prioritization that can integrate multiple heterogeneous data sources by taking advantage of a unified graphic representation. Gene-gene relationships and gene-disease relationships are then defined based on the overall topology of each network using a diffusion kernel measure. These relationship measures are in turn normalized to derive an overall measure across all networks, which is utilized to rank all candidate genes. Based on the informativeness of available data sources with respect to each specific disease, we also propose an adaptive threshold score to select a small subset of candidate genes for further validation studies. We performed large scale cross-validation analysis on 110 disease families using three data sources. Results have shown that our approach consistently outperforms other two state of the art programs. A case study using Parkinson disease (PD) has identified four candidate genes (UBB, SEPT5, GPR37 and TH) that ranked higher than our adaptive threshold, all of which are involved in the PD pathway. In particular, a very recent study has observed a deletion of TH in a patient with PD, which supports the importance of the TH gene in PD pathogenesis. A web tool has been implemented to assist scientists in their genetic studies. Public Library of Science 2011-06-24 /pmc/articles/PMC3123338/ /pubmed/21731658 http://dx.doi.org/10.1371/journal.pone.0021137 Text en Chen et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Chen, Yixuan
Wang, Wenhui
Zhou, Yingyao
Shields, Robert
Chanda, Sumit K.
Elston, Robert C.
Li, Jing
In Silico Gene Prioritization by Integrating Multiple Data Sources
title In Silico Gene Prioritization by Integrating Multiple Data Sources
title_full In Silico Gene Prioritization by Integrating Multiple Data Sources
title_fullStr In Silico Gene Prioritization by Integrating Multiple Data Sources
title_full_unstemmed In Silico Gene Prioritization by Integrating Multiple Data Sources
title_short In Silico Gene Prioritization by Integrating Multiple Data Sources
title_sort in silico gene prioritization by integrating multiple data sources
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3123338/
https://www.ncbi.nlm.nih.gov/pubmed/21731658
http://dx.doi.org/10.1371/journal.pone.0021137
work_keys_str_mv AT chenyixuan insilicogeneprioritizationbyintegratingmultipledatasources
AT wangwenhui insilicogeneprioritizationbyintegratingmultipledatasources
AT zhouyingyao insilicogeneprioritizationbyintegratingmultipledatasources
AT shieldsrobert insilicogeneprioritizationbyintegratingmultipledatasources
AT chandasumitk insilicogeneprioritizationbyintegratingmultipledatasources
AT elstonrobertc insilicogeneprioritizationbyintegratingmultipledatasources
AT lijing insilicogeneprioritizationbyintegratingmultipledatasources