Cargando…

Gene prioritization using Bayesian matrix factorization with genomic and phenotypic side information

MOTIVATION: Most gene prioritization methods model each disease or phenotype individually, but this fails to capture patterns common to several diseases or phenotypes. To overcome this limitation, we formulate the gene prioritization task as the factorization of a sparsely filled gene-phenotype matr...

Descripción completa

Detalles Bibliográficos
Autores principales: Zakeri, Pooya, Simm, Jaak, Arany, Adam, ElShal, Sarah, Moreau, Yves
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022676/
https://www.ncbi.nlm.nih.gov/pubmed/29949967
http://dx.doi.org/10.1093/bioinformatics/bty289
_version_ 1783335729006379008
author Zakeri, Pooya
Simm, Jaak
Arany, Adam
ElShal, Sarah
Moreau, Yves
author_facet Zakeri, Pooya
Simm, Jaak
Arany, Adam
ElShal, Sarah
Moreau, Yves
author_sort Zakeri, Pooya
collection PubMed
description MOTIVATION: Most gene prioritization methods model each disease or phenotype individually, but this fails to capture patterns common to several diseases or phenotypes. To overcome this limitation, we formulate the gene prioritization task as the factorization of a sparsely filled gene-phenotype matrix, where the objective is to predict the unknown matrix entries. To deliver more accurate gene-phenotype matrix completion, we extend classical Bayesian matrix factorization to work with multiple side information sources. The availability of side information allows us to make non-trivial predictions for genes for which no previous disease association is known. RESULTS: Our gene prioritization method can innovatively not only integrate data sources describing genes, but also data sources describing Human Phenotype Ontology terms. Experimental results on our benchmarks show that our proposed model can effectively improve accuracy over the well-established gene prioritization method, Endeavour. In particular, our proposed method offers promising results on diseases of the nervous system; diseases of the eye and adnexa; endocrine, nutritional and metabolic diseases; and congenital malformations, deformations and chromosomal abnormalities, when compared to Endeavour. AVAILABILITY AND IMPLEMENTATION: The Bayesian data fusion method is implemented as a Python/C++ package: https://github.com/jaak-s/macau. It is also available as a Julia package: https://github.com/jaak-s/BayesianDataFusion.jl. All data and benchmarks generated or analyzed during this study can be downloaded at https://owncloud.esat.kuleuven.be/index.php/s/UGb89WfkZwMYoTn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6022676
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60226762018-07-05 Gene prioritization using Bayesian matrix factorization with genomic and phenotypic side information Zakeri, Pooya Simm, Jaak Arany, Adam ElShal, Sarah Moreau, Yves Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: Most gene prioritization methods model each disease or phenotype individually, but this fails to capture patterns common to several diseases or phenotypes. To overcome this limitation, we formulate the gene prioritization task as the factorization of a sparsely filled gene-phenotype matrix, where the objective is to predict the unknown matrix entries. To deliver more accurate gene-phenotype matrix completion, we extend classical Bayesian matrix factorization to work with multiple side information sources. The availability of side information allows us to make non-trivial predictions for genes for which no previous disease association is known. RESULTS: Our gene prioritization method can innovatively not only integrate data sources describing genes, but also data sources describing Human Phenotype Ontology terms. Experimental results on our benchmarks show that our proposed model can effectively improve accuracy over the well-established gene prioritization method, Endeavour. In particular, our proposed method offers promising results on diseases of the nervous system; diseases of the eye and adnexa; endocrine, nutritional and metabolic diseases; and congenital malformations, deformations and chromosomal abnormalities, when compared to Endeavour. AVAILABILITY AND IMPLEMENTATION: The Bayesian data fusion method is implemented as a Python/C++ package: https://github.com/jaak-s/macau. It is also available as a Julia package: https://github.com/jaak-s/BayesianDataFusion.jl. All data and benchmarks generated or analyzed during this study can be downloaded at https://owncloud.esat.kuleuven.be/index.php/s/UGb89WfkZwMYoTn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022676/ /pubmed/29949967 http://dx.doi.org/10.1093/bioinformatics/bty289 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
Zakeri, Pooya
Simm, Jaak
Arany, Adam
ElShal, Sarah
Moreau, Yves
Gene prioritization using Bayesian matrix factorization with genomic and phenotypic side information
title Gene prioritization using Bayesian matrix factorization with genomic and phenotypic side information
title_full Gene prioritization using Bayesian matrix factorization with genomic and phenotypic side information
title_fullStr Gene prioritization using Bayesian matrix factorization with genomic and phenotypic side information
title_full_unstemmed Gene prioritization using Bayesian matrix factorization with genomic and phenotypic side information
title_short Gene prioritization using Bayesian matrix factorization with genomic and phenotypic side information
title_sort gene prioritization using bayesian matrix factorization with genomic and phenotypic side information
topic Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022676/
https://www.ncbi.nlm.nih.gov/pubmed/29949967
http://dx.doi.org/10.1093/bioinformatics/bty289
work_keys_str_mv AT zakeripooya geneprioritizationusingbayesianmatrixfactorizationwithgenomicandphenotypicsideinformation
AT simmjaak geneprioritizationusingbayesianmatrixfactorizationwithgenomicandphenotypicsideinformation
AT aranyadam geneprioritizationusingbayesianmatrixfactorizationwithgenomicandphenotypicsideinformation
AT elshalsarah geneprioritizationusingbayesianmatrixfactorizationwithgenomicandphenotypicsideinformation
AT moreauyves geneprioritizationusingbayesianmatrixfactorizationwithgenomicandphenotypicsideinformation