Cargando…

Accounting for Redundancy when Integrating Gene Interaction Databases

During the last years gene interaction networks are increasingly being used for the assessment and interpretation of biological measurements. Knowledge of the interaction partners of an unknown protein allows scientists to understand the complex relationships between genetic products, helps to revea...

Descripción completa

Detalles Bibliográficos
Autores principales:	Elefsinioti, Antigoni, Ackermann, Marit, Beyer, Andreas
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2760779/ https://www.ncbi.nlm.nih.gov/pubmed/19847299 http://dx.doi.org/10.1371/journal.pone.0007492

_version_	1782172778622877696
author	Elefsinioti, Antigoni Ackermann, Marit Beyer, Andreas
author_facet	Elefsinioti, Antigoni Ackermann, Marit Beyer, Andreas
author_sort	Elefsinioti, Antigoni
collection	PubMed
description	During the last years gene interaction networks are increasingly being used for the assessment and interpretation of biological measurements. Knowledge of the interaction partners of an unknown protein allows scientists to understand the complex relationships between genetic products, helps to reveal unknown biological functions and pathways, and get a more detailed picture of an organism's complexity. Being able to measure all protein interactions under all relevant conditions is virtually impossible. Hence, computational methods integrating different datasets for predicting gene interactions are needed. However, when integrating different sources one has to account for the fact that some parts of the information may be redundant, which may lead to an overestimation of the true likelihood of an interaction. Our method integrates information derived from three different databases (Bioverse, HiMAP and STRING) for predicting human gene interactions. A Bayesian approach was implemented in order to integrate the different data sources on a common quantitative scale. An important assumption of the Bayesian integration is independence of the input data (features). Our study shows that the conditional dependency cannot be ignored when combining gene interaction databases that rely on partially overlapping input data. In addition, we show how the correlation structure between the databases can be detected and we propose a linear model to correct for this bias. Benchmarking the results against two independent reference data sets shows that the integrated model outperforms the individual datasets. Our method provides an intuitive strategy for weighting the different features while accounting for their conditional dependencies.
format	Text
id	pubmed-2760779
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-27607792009-10-22 Accounting for Redundancy when Integrating Gene Interaction Databases Elefsinioti, Antigoni Ackermann, Marit Beyer, Andreas PLoS One Research Article During the last years gene interaction networks are increasingly being used for the assessment and interpretation of biological measurements. Knowledge of the interaction partners of an unknown protein allows scientists to understand the complex relationships between genetic products, helps to reveal unknown biological functions and pathways, and get a more detailed picture of an organism's complexity. Being able to measure all protein interactions under all relevant conditions is virtually impossible. Hence, computational methods integrating different datasets for predicting gene interactions are needed. However, when integrating different sources one has to account for the fact that some parts of the information may be redundant, which may lead to an overestimation of the true likelihood of an interaction. Our method integrates information derived from three different databases (Bioverse, HiMAP and STRING) for predicting human gene interactions. A Bayesian approach was implemented in order to integrate the different data sources on a common quantitative scale. An important assumption of the Bayesian integration is independence of the input data (features). Our study shows that the conditional dependency cannot be ignored when combining gene interaction databases that rely on partially overlapping input data. In addition, we show how the correlation structure between the databases can be detected and we propose a linear model to correct for this bias. Benchmarking the results against two independent reference data sets shows that the integrated model outperforms the individual datasets. Our method provides an intuitive strategy for weighting the different features while accounting for their conditional dependencies. Public Library of Science 2009-10-22 /pmc/articles/PMC2760779/ /pubmed/19847299 http://dx.doi.org/10.1371/journal.pone.0007492 Text en Elefsinioti et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Elefsinioti, Antigoni Ackermann, Marit Beyer, Andreas Accounting for Redundancy when Integrating Gene Interaction Databases
title	Accounting for Redundancy when Integrating Gene Interaction Databases
title_full	Accounting for Redundancy when Integrating Gene Interaction Databases
title_fullStr	Accounting for Redundancy when Integrating Gene Interaction Databases
title_full_unstemmed	Accounting for Redundancy when Integrating Gene Interaction Databases
title_short	Accounting for Redundancy when Integrating Gene Interaction Databases
title_sort	accounting for redundancy when integrating gene interaction databases
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2760779/ https://www.ncbi.nlm.nih.gov/pubmed/19847299 http://dx.doi.org/10.1371/journal.pone.0007492
work_keys_str_mv	AT elefsiniotiantigoni accountingforredundancywhenintegratinggeneinteractiondatabases AT ackermannmarit accountingforredundancywhenintegratinggeneinteractiondatabases AT beyerandreas accountingforredundancywhenintegratinggeneinteractiondatabases

Accounting for Redundancy when Integrating Gene Interaction Databases

Ejemplares similares