Cargando…

Biomine: predicting links between biological entities using network models of heterogeneous databases

BACKGROUND: Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships an...

Descripción completa

Detalles Bibliográficos
Autores principales:	Eronen, Lauri, Toivonen, Hannu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3505483/ https://www.ncbi.nlm.nih.gov/pubmed/22672646 http://dx.doi.org/10.1186/1471-2105-13-119

_version_	1782250764012355584
author	Eronen, Lauri Toivonen, Hannu
author_facet	Eronen, Lauri Toivonen, Hannu
author_sort	Eronen, Lauri
collection	PubMed
description	BACKGROUND: Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. RESULTS: Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. CONCLUSIONS: The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available. The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.
format	Online Article Text
id	pubmed-3505483
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35054832012-11-29 Biomine: predicting links between biological entities using network models of heterogeneous databases Eronen, Lauri Toivonen, Hannu BMC Bioinformatics Methodology Article BACKGROUND: Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. RESULTS: Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. CONCLUSIONS: The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available. The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities. BioMed Central 2012-06-06 /pmc/articles/PMC3505483/ /pubmed/22672646 http://dx.doi.org/10.1186/1471-2105-13-119 Text en Copyright ©2012 Eronen and Toivonen; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Eronen, Lauri Toivonen, Hannu Biomine: predicting links between biological entities using network models of heterogeneous databases
title	Biomine: predicting links between biological entities using network models of heterogeneous databases
title_full	Biomine: predicting links between biological entities using network models of heterogeneous databases
title_fullStr	Biomine: predicting links between biological entities using network models of heterogeneous databases
title_full_unstemmed	Biomine: predicting links between biological entities using network models of heterogeneous databases
title_short	Biomine: predicting links between biological entities using network models of heterogeneous databases
title_sort	biomine: predicting links between biological entities using network models of heterogeneous databases
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3505483/ https://www.ncbi.nlm.nih.gov/pubmed/22672646 http://dx.doi.org/10.1186/1471-2105-13-119
work_keys_str_mv	AT eronenlauri biominepredictinglinksbetweenbiologicalentitiesusingnetworkmodelsofheterogeneousdatabases AT toivonenhannu biominepredictinglinksbetweenbiologicalentitiesusingnetworkmodelsofheterogeneousdatabases

Biomine: predicting links between biological entities using network models of heterogeneous databases

Ejemplares similares