Cargando…

Inductive matrix completion for predicting gene–disease associations

Motivation: Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies—for example, we may know linked genes, keywords associated with the di...

Descripción completa

Detalles Bibliográficos
Autores principales:	Natarajan, Nagarajan, Dhillon, Inderjit S.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2014
Materias:	Ismb 2014 Proceedings Papers Committee
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058925/ https://www.ncbi.nlm.nih.gov/pubmed/24932006 http://dx.doi.org/10.1093/bioinformatics/btu269

_version_	1782321187580280832
author	Natarajan, Nagarajan Dhillon, Inderjit S.
author_facet	Natarajan, Nagarajan Dhillon, Inderjit S.
author_sort	Natarajan, Nagarajan
collection	PubMed
description	Motivation: Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies—for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies—for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene–disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive. Results: Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better—it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed Catapult method (second best) that has <15% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e. genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature. Availability: Source code and datasets can be downloaded from http://bigdata.ices.utexas.edu/project/gene-disease. Contact: naga86@cs.utexas.edu
format	Online Article Text
id	pubmed-4058925
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-40589252014-06-18 Inductive matrix completion for predicting gene–disease associations Natarajan, Nagarajan Dhillon, Inderjit S. Bioinformatics Ismb 2014 Proceedings Papers Committee Motivation: Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies—for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies—for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene–disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive. Results: Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better—it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed Catapult method (second best) that has <15% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e. genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature. Availability: Source code and datasets can be downloaded from http://bigdata.ices.utexas.edu/project/gene-disease. Contact: naga86@cs.utexas.edu Oxford University Press 2014-06-15 2014-06-11 /pmc/articles/PMC4058925/ /pubmed/24932006 http://dx.doi.org/10.1093/bioinformatics/btu269 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Ismb 2014 Proceedings Papers Committee Natarajan, Nagarajan Dhillon, Inderjit S. Inductive matrix completion for predicting gene–disease associations
title	Inductive matrix completion for predicting gene–disease associations
title_full	Inductive matrix completion for predicting gene–disease associations
title_fullStr	Inductive matrix completion for predicting gene–disease associations
title_full_unstemmed	Inductive matrix completion for predicting gene–disease associations
title_short	Inductive matrix completion for predicting gene–disease associations
title_sort	inductive matrix completion for predicting gene–disease associations
topic	Ismb 2014 Proceedings Papers Committee
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058925/ https://www.ncbi.nlm.nih.gov/pubmed/24932006 http://dx.doi.org/10.1093/bioinformatics/btu269
work_keys_str_mv	AT natarajannagarajan inductivematrixcompletionforpredictinggenediseaseassociations AT dhilloninderjits inductivematrixcompletionforpredictinggenediseaseassociations

Inductive matrix completion for predicting gene–disease associations

Ejemplares similares