Cargando…

Pinpointing disease genes through phenomic and genomic data fusion

BACKGROUND: Pinpointing genes involved in inherited human diseases remains a great challenge in the post-genomics era. Although approaches have been proposed either based on the guilt-by-association principle or making use of disease phenotype similarities, the low coverage of both diseases and gene...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Rui, Wu, Mengmeng, Li, Lianshuo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331717/
https://www.ncbi.nlm.nih.gov/pubmed/25708473
http://dx.doi.org/10.1186/1471-2164-16-S2-S3
_version_ 1782357765733220352
author Jiang, Rui
Wu, Mengmeng
Li, Lianshuo
author_facet Jiang, Rui
Wu, Mengmeng
Li, Lianshuo
author_sort Jiang, Rui
collection PubMed
description BACKGROUND: Pinpointing genes involved in inherited human diseases remains a great challenge in the post-genomics era. Although approaches have been proposed either based on the guilt-by-association principle or making use of disease phenotype similarities, the low coverage of both diseases and genes in existing methods has been preventing the scan of causative genes for a significant proportion of diseases at the whole-genome level. RESULTS: To overcome this limitation, we proposed a rigorous statistical method called pgFusion to prioritize candidate genes by integrating one type of disease phenotype similarity derived from the Unified Medical Language System (UMLS) and seven types of gene functional similarities calculated from gene expression, gene ontology, pathway membership, protein sequence, protein domain, protein-protein interaction and regulation pattern, respectively. Our method covered a total of 7,719 diseases and 20,327 genes, achieving the highest coverage thus far for both diseases and genes. We performed leave-one-out cross-validation experiments to demonstrate the superior performance of our method and applied it to a real exome sequencing dataset of epileptic encephalopathies, showing the capability of this approach in finding causative genes for complex diseases. We further provided the standalone software and online services of pgFusion at http://bioinfo.au.tsinghua.edu.cn/jianglab/pgfusion. CONCLUSIONS: pgFusion not only provided an effective way for prioritizing candidate genes, but also demonstrated feasible solutions to two fundamental questions in the analysis of big genomic data: the comparability of heterogeneous data and the integration of multiple types of data. Applications of this method in exome or whole genome sequencing studies would accelerate the finding of causative genes for human diseases. Other research fields in genomics could also benefit from the incorporation of our data fusion methodology.
format Online
Article
Text
id pubmed-4331717
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43317172015-03-19 Pinpointing disease genes through phenomic and genomic data fusion Jiang, Rui Wu, Mengmeng Li, Lianshuo BMC Genomics Proceedings BACKGROUND: Pinpointing genes involved in inherited human diseases remains a great challenge in the post-genomics era. Although approaches have been proposed either based on the guilt-by-association principle or making use of disease phenotype similarities, the low coverage of both diseases and genes in existing methods has been preventing the scan of causative genes for a significant proportion of diseases at the whole-genome level. RESULTS: To overcome this limitation, we proposed a rigorous statistical method called pgFusion to prioritize candidate genes by integrating one type of disease phenotype similarity derived from the Unified Medical Language System (UMLS) and seven types of gene functional similarities calculated from gene expression, gene ontology, pathway membership, protein sequence, protein domain, protein-protein interaction and regulation pattern, respectively. Our method covered a total of 7,719 diseases and 20,327 genes, achieving the highest coverage thus far for both diseases and genes. We performed leave-one-out cross-validation experiments to demonstrate the superior performance of our method and applied it to a real exome sequencing dataset of epileptic encephalopathies, showing the capability of this approach in finding causative genes for complex diseases. We further provided the standalone software and online services of pgFusion at http://bioinfo.au.tsinghua.edu.cn/jianglab/pgfusion. CONCLUSIONS: pgFusion not only provided an effective way for prioritizing candidate genes, but also demonstrated feasible solutions to two fundamental questions in the analysis of big genomic data: the comparability of heterogeneous data and the integration of multiple types of data. Applications of this method in exome or whole genome sequencing studies would accelerate the finding of causative genes for human diseases. Other research fields in genomics could also benefit from the incorporation of our data fusion methodology. BioMed Central 2015-01-21 /pmc/articles/PMC4331717/ /pubmed/25708473 http://dx.doi.org/10.1186/1471-2164-16-S2-S3 Text en Copyright © 2015 Jiang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Jiang, Rui
Wu, Mengmeng
Li, Lianshuo
Pinpointing disease genes through phenomic and genomic data fusion
title Pinpointing disease genes through phenomic and genomic data fusion
title_full Pinpointing disease genes through phenomic and genomic data fusion
title_fullStr Pinpointing disease genes through phenomic and genomic data fusion
title_full_unstemmed Pinpointing disease genes through phenomic and genomic data fusion
title_short Pinpointing disease genes through phenomic and genomic data fusion
title_sort pinpointing disease genes through phenomic and genomic data fusion
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331717/
https://www.ncbi.nlm.nih.gov/pubmed/25708473
http://dx.doi.org/10.1186/1471-2164-16-S2-S3
work_keys_str_mv AT jiangrui pinpointingdiseasegenesthroughphenomicandgenomicdatafusion
AT wumengmeng pinpointingdiseasegenesthroughphenomicandgenomicdatafusion
AT lilianshuo pinpointingdiseasegenesthroughphenomicandgenomicdatafusion