Cargando…

pBRIT: gene prioritization by correlating functional and phenotypic annotations through integrative data fusion

MOTIVATION: Computational gene prioritization can aid in disease gene identification. Here, we propose pBRIT (prioritization using Bayesian Ridge regression and Information Theoretic model), a novel adaptive and scalable prioritization tool, integrating Pubmed abstracts, Gene Ontology, Sequence simi...

Descripción completa

Detalles Bibliográficos
Autores principales: Kumar, Ajay Anand, Van Laer, Lut, Alaerts, Maaike, Ardeshirdavani, Amin, Moreau, Yves, Laukens, Kris, Loeys, Bart, Vandeweyer, Geert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022555/
https://www.ncbi.nlm.nih.gov/pubmed/29452392
http://dx.doi.org/10.1093/bioinformatics/bty079
_version_ 1783335703405395968
author Kumar, Ajay Anand
Van Laer, Lut
Alaerts, Maaike
Ardeshirdavani, Amin
Moreau, Yves
Laukens, Kris
Loeys, Bart
Vandeweyer, Geert
author_facet Kumar, Ajay Anand
Van Laer, Lut
Alaerts, Maaike
Ardeshirdavani, Amin
Moreau, Yves
Laukens, Kris
Loeys, Bart
Vandeweyer, Geert
author_sort Kumar, Ajay Anand
collection PubMed
description MOTIVATION: Computational gene prioritization can aid in disease gene identification. Here, we propose pBRIT (prioritization using Bayesian Ridge regression and Information Theoretic model), a novel adaptive and scalable prioritization tool, integrating Pubmed abstracts, Gene Ontology, Sequence similarities, Mammalian and Human Phenotype Ontology, Pathway, Interactions, Disease Ontology, Gene Association database and Human Genome Epidemiology database, into the prediction model. We explore and address effects of sparsity and inter-feature dependencies within annotation sources, and the impact of bias towards specific annotations. RESULTS: pBRIT models feature dependencies and sparsity by an Information-Theoretic (data driven) approach and applies intermediate integration based data fusion. Following the hypothesis that genes underlying similar diseases will share functional and phenotype characteristics, it incorporates Bayesian Ridge regression to learn a linear mapping between functional and phenotype annotations. Genes are prioritized on phenotypic concordance to the training genes. We evaluated pBRIT against nine existing methods, and on over 2000 HPO-gene associations retrieved after construction of pBRIT data sources. We achieve maximum AUC scores ranging from 0.92 to 0.96 against benchmark datasets and of 0.80 against the time-stamped HPO entries, indicating good performance with high sensitivity and specificity. Our model shows stable performance with regard to changes in the underlying annotation data, is fast and scalable for implementation in routine pipelines. AVAILABILITY AND IMPLEMENTATION: http://biomina.be/apps/pbrit/; https://bitbucket.org/medgenua/pbrit. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6022555
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60225552018-07-10 pBRIT: gene prioritization by correlating functional and phenotypic annotations through integrative data fusion Kumar, Ajay Anand Van Laer, Lut Alaerts, Maaike Ardeshirdavani, Amin Moreau, Yves Laukens, Kris Loeys, Bart Vandeweyer, Geert Bioinformatics Original Papers MOTIVATION: Computational gene prioritization can aid in disease gene identification. Here, we propose pBRIT (prioritization using Bayesian Ridge regression and Information Theoretic model), a novel adaptive and scalable prioritization tool, integrating Pubmed abstracts, Gene Ontology, Sequence similarities, Mammalian and Human Phenotype Ontology, Pathway, Interactions, Disease Ontology, Gene Association database and Human Genome Epidemiology database, into the prediction model. We explore and address effects of sparsity and inter-feature dependencies within annotation sources, and the impact of bias towards specific annotations. RESULTS: pBRIT models feature dependencies and sparsity by an Information-Theoretic (data driven) approach and applies intermediate integration based data fusion. Following the hypothesis that genes underlying similar diseases will share functional and phenotype characteristics, it incorporates Bayesian Ridge regression to learn a linear mapping between functional and phenotype annotations. Genes are prioritized on phenotypic concordance to the training genes. We evaluated pBRIT against nine existing methods, and on over 2000 HPO-gene associations retrieved after construction of pBRIT data sources. We achieve maximum AUC scores ranging from 0.92 to 0.96 against benchmark datasets and of 0.80 against the time-stamped HPO entries, indicating good performance with high sensitivity and specificity. Our model shows stable performance with regard to changes in the underlying annotation data, is fast and scalable for implementation in routine pipelines. AVAILABILITY AND IMPLEMENTATION: http://biomina.be/apps/pbrit/; https://bitbucket.org/medgenua/pbrit. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-07-01 2018-02-14 /pmc/articles/PMC6022555/ /pubmed/29452392 http://dx.doi.org/10.1093/bioinformatics/bty079 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Kumar, Ajay Anand
Van Laer, Lut
Alaerts, Maaike
Ardeshirdavani, Amin
Moreau, Yves
Laukens, Kris
Loeys, Bart
Vandeweyer, Geert
pBRIT: gene prioritization by correlating functional and phenotypic annotations through integrative data fusion
title pBRIT: gene prioritization by correlating functional and phenotypic annotations through integrative data fusion
title_full pBRIT: gene prioritization by correlating functional and phenotypic annotations through integrative data fusion
title_fullStr pBRIT: gene prioritization by correlating functional and phenotypic annotations through integrative data fusion
title_full_unstemmed pBRIT: gene prioritization by correlating functional and phenotypic annotations through integrative data fusion
title_short pBRIT: gene prioritization by correlating functional and phenotypic annotations through integrative data fusion
title_sort pbrit: gene prioritization by correlating functional and phenotypic annotations through integrative data fusion
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022555/
https://www.ncbi.nlm.nih.gov/pubmed/29452392
http://dx.doi.org/10.1093/bioinformatics/bty079
work_keys_str_mv AT kumarajayanand pbritgeneprioritizationbycorrelatingfunctionalandphenotypicannotationsthroughintegrativedatafusion
AT vanlaerlut pbritgeneprioritizationbycorrelatingfunctionalandphenotypicannotationsthroughintegrativedatafusion
AT alaertsmaaike pbritgeneprioritizationbycorrelatingfunctionalandphenotypicannotationsthroughintegrativedatafusion
AT ardeshirdavaniamin pbritgeneprioritizationbycorrelatingfunctionalandphenotypicannotationsthroughintegrativedatafusion
AT moreauyves pbritgeneprioritizationbycorrelatingfunctionalandphenotypicannotationsthroughintegrativedatafusion
AT laukenskris pbritgeneprioritizationbycorrelatingfunctionalandphenotypicannotationsthroughintegrativedatafusion
AT loeysbart pbritgeneprioritizationbycorrelatingfunctionalandphenotypicannotationsthroughintegrativedatafusion
AT vandeweyergeert pbritgeneprioritizationbycorrelatingfunctionalandphenotypicannotationsthroughintegrativedatafusion