Cargando…

A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records

BACKGROUND: Prioritizing genetic variants is a challenge because disease susceptibility loci are often located in genes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-mining exercise on the biomedical literature can establish the phenotypic profile...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jiang, Li, Edwards, Stefan M, Thomsen, Bo, Workman, Christopher T, Guldbrandtsen, Bernt, Sørensen, Peter
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4181406/ https://www.ncbi.nlm.nih.gov/pubmed/25253562 http://dx.doi.org/10.1186/1471-2105-15-315

_version_	1782337366842671104
author	Jiang, Li Edwards, Stefan M Thomsen, Bo Workman, Christopher T Guldbrandtsen, Bernt Sørensen, Peter
author_facet	Jiang, Li Edwards, Stefan M Thomsen, Bo Workman, Christopher T Guldbrandtsen, Bernt Sørensen, Peter
author_sort	Jiang, Li
collection	PubMed
description	BACKGROUND: Prioritizing genetic variants is a challenge because disease susceptibility loci are often located in genes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-mining exercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connection to disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity of common diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-based approach combined with phenotypic profiling would be useful for disease gene prioritization. RESULTS: We developed a random-set scoring model and implemented it to quantify phenotype relevance in a network-based disease gene-prioritization approach. We validated our approach based on different gene phenotypic profiles, which were generated from PubMed abstracts, OMIM, and GeneRIF records. We also investigated the validity of several vocabulary filters and different likelihood thresholds for predicted protein-protein interactions in terms of their effect on the network-based gene-prioritization approach, which relies on text-mining of the phenotype data. Our method demonstrated good precision and sensitivity compared with those of two alternative complex-based prioritization approaches. We then conducted a global ranking of all human genes according to their relevance to a range of human diseases. The resulting accurate ranking of known causal genes supported the reliability of our approach. Moreover, these data suggest many promising novel candidate genes for human disorders that have a complex mode of inheritance. CONCLUSION: We have implemented and validated a network-based approach to prioritize genes for human diseases based on their phenotypic profile. We have devised a powerful and transparent tool to identify and rank candidate genes. Our global gene prioritization provides a unique resource for the biological interpretation of data from genome-wide association studies, and will help in the understanding of how the associated genetic variants influence disease or quantitative phenotypes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2105-15-315) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4181406
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-41814062014-10-03 A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records Jiang, Li Edwards, Stefan M Thomsen, Bo Workman, Christopher T Guldbrandtsen, Bernt Sørensen, Peter BMC Bioinformatics Research Article BACKGROUND: Prioritizing genetic variants is a challenge because disease susceptibility loci are often located in genes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-mining exercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connection to disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity of common diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-based approach combined with phenotypic profiling would be useful for disease gene prioritization. RESULTS: We developed a random-set scoring model and implemented it to quantify phenotype relevance in a network-based disease gene-prioritization approach. We validated our approach based on different gene phenotypic profiles, which were generated from PubMed abstracts, OMIM, and GeneRIF records. We also investigated the validity of several vocabulary filters and different likelihood thresholds for predicted protein-protein interactions in terms of their effect on the network-based gene-prioritization approach, which relies on text-mining of the phenotype data. Our method demonstrated good precision and sensitivity compared with those of two alternative complex-based prioritization approaches. We then conducted a global ranking of all human genes according to their relevance to a range of human diseases. The resulting accurate ranking of known causal genes supported the reliability of our approach. Moreover, these data suggest many promising novel candidate genes for human disorders that have a complex mode of inheritance. CONCLUSION: We have implemented and validated a network-based approach to prioritize genes for human diseases based on their phenotypic profile. We have devised a powerful and transparent tool to identify and rank candidate genes. Our global gene prioritization provides a unique resource for the biological interpretation of data from genome-wide association studies, and will help in the understanding of how the associated genetic variants influence disease or quantitative phenotypes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2105-15-315) contains supplementary material, which is available to authorized users. BioMed Central 2014-09-24 /pmc/articles/PMC4181406/ /pubmed/25253562 http://dx.doi.org/10.1186/1471-2105-15-315 Text en © Jiang et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Jiang, Li Edwards, Stefan M Thomsen, Bo Workman, Christopher T Guldbrandtsen, Bernt Sørensen, Peter A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records
title	A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records
title_full	A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records
title_fullStr	A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records
title_full_unstemmed	A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records
title_short	A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records
title_sort	random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of generif, omim and pubmed records
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4181406/ https://www.ncbi.nlm.nih.gov/pubmed/25253562 http://dx.doi.org/10.1186/1471-2105-15-315
work_keys_str_mv	AT jiangli arandomsetscoringmodelforprioritizationofdiseasecandidategenesusingproteincomplexesanddataminingofgenerifomimandpubmedrecords AT edwardsstefanm arandomsetscoringmodelforprioritizationofdiseasecandidategenesusingproteincomplexesanddataminingofgenerifomimandpubmedrecords AT thomsenbo arandomsetscoringmodelforprioritizationofdiseasecandidategenesusingproteincomplexesanddataminingofgenerifomimandpubmedrecords AT workmanchristophert arandomsetscoringmodelforprioritizationofdiseasecandidategenesusingproteincomplexesanddataminingofgenerifomimandpubmedrecords AT guldbrandtsenbernt arandomsetscoringmodelforprioritizationofdiseasecandidategenesusingproteincomplexesanddataminingofgenerifomimandpubmedrecords AT sørensenpeter arandomsetscoringmodelforprioritizationofdiseasecandidategenesusingproteincomplexesanddataminingofgenerifomimandpubmedrecords AT jiangli randomsetscoringmodelforprioritizationofdiseasecandidategenesusingproteincomplexesanddataminingofgenerifomimandpubmedrecords AT edwardsstefanm randomsetscoringmodelforprioritizationofdiseasecandidategenesusingproteincomplexesanddataminingofgenerifomimandpubmedrecords AT thomsenbo randomsetscoringmodelforprioritizationofdiseasecandidategenesusingproteincomplexesanddataminingofgenerifomimandpubmedrecords AT workmanchristophert randomsetscoringmodelforprioritizationofdiseasecandidategenesusingproteincomplexesanddataminingofgenerifomimandpubmedrecords AT guldbrandtsenbernt randomsetscoringmodelforprioritizationofdiseasecandidategenesusingproteincomplexesanddataminingofgenerifomimandpubmedrecords AT sørensenpeter randomsetscoringmodelforprioritizationofdiseasecandidategenesusingproteincomplexesanddataminingofgenerifomimandpubmedrecords

A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records

Ejemplares similares