Cargando…

Assessing the gain of biological data integration in gene networks inference

BACKGROUND: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several comp...

Descripción completa

Detalles Bibliográficos
Autores principales: Vicente, Fábio FR, Lopes, Fabrício M, Hashimoto, Ronaldo F, Cesar, Roberto M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3481449/
https://www.ncbi.nlm.nih.gov/pubmed/23134775
http://dx.doi.org/10.1186/1471-2164-13-S6-S7
_version_ 1782247741092528128
author Vicente, Fábio FR
Lopes, Fabrício M
Hashimoto, Ronaldo F
Cesar, Roberto M
author_facet Vicente, Fábio FR
Lopes, Fabrício M
Hashimoto, Ronaldo F
Cesar, Roberto M
author_sort Vicente, Fábio FR
collection PubMed
description BACKGROUND: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several components of this network interact with each other and keep their functions stable. However, in general there is no sufficient data to accurately recover the GNs from their expression levels leading to the curse of dimensionality, in which the number of variables is higher than samples. One way to mitigate this problem is to integrate biological data instead of using only the expression profiles in the inference process. Nowadays, the use of several biological information in inference methods had a significant increase in order to better recover the connections between genes and reduce the false positives. What makes this strategy so interesting is the possibility of confirming the known connections through the included biological data, and the possibility of discovering new relationships between genes when observed the expression data. Although several works in data integration have increased the performance of the network inference methods, the real contribution of adding each type of biological information in the obtained improvement is not clear. METHODS: We propose a methodology to include biological information into an inference algorithm in order to assess its prediction gain by using biological information and expression profile together. We also evaluated and compared the gain of adding four types of biological information: (a) protein-protein interaction, (b) Rosetta stone fusion proteins, (c) KEGG and (d) KEGG+GO. RESULTS AND CONCLUSIONS: This work presents a first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism. Our results indicates that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG. Also, as expected, the results show that the use of biological information is a very important approach for the improvement of the inference. We also compared the gain in the inference of the global network and only the hubs. The results indicates that the use of biological information can improve the identification of the most connected proteins.
format Online
Article
Text
id pubmed-3481449
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34814492012-11-02 Assessing the gain of biological data integration in gene networks inference Vicente, Fábio FR Lopes, Fabrício M Hashimoto, Ronaldo F Cesar, Roberto M BMC Genomics Research BACKGROUND: A current challenge in gene annotation is to define the gene function in the context of the network of relationships instead of using single genes. The inference of gene networks (GNs) has emerged as an approach to better understand the biology of the system and to study how several components of this network interact with each other and keep their functions stable. However, in general there is no sufficient data to accurately recover the GNs from their expression levels leading to the curse of dimensionality, in which the number of variables is higher than samples. One way to mitigate this problem is to integrate biological data instead of using only the expression profiles in the inference process. Nowadays, the use of several biological information in inference methods had a significant increase in order to better recover the connections between genes and reduce the false positives. What makes this strategy so interesting is the possibility of confirming the known connections through the included biological data, and the possibility of discovering new relationships between genes when observed the expression data. Although several works in data integration have increased the performance of the network inference methods, the real contribution of adding each type of biological information in the obtained improvement is not clear. METHODS: We propose a methodology to include biological information into an inference algorithm in order to assess its prediction gain by using biological information and expression profile together. We also evaluated and compared the gain of adding four types of biological information: (a) protein-protein interaction, (b) Rosetta stone fusion proteins, (c) KEGG and (d) KEGG+GO. RESULTS AND CONCLUSIONS: This work presents a first comparison of the gain in the use of prior biological information in the inference of GNs by considering the eukaryote (P. falciparum) organism. Our results indicates that information based on direct interaction can produce a higher improvement in the gain than data about a less specific relationship as GO or KEGG. Also, as expected, the results show that the use of biological information is a very important approach for the improvement of the inference. We also compared the gain in the inference of the global network and only the hubs. The results indicates that the use of biological information can improve the identification of the most connected proteins. BioMed Central 2012-10-26 /pmc/articles/PMC3481449/ /pubmed/23134775 http://dx.doi.org/10.1186/1471-2164-13-S6-S7 Text en Copyright ©2012 Vicente et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Vicente, Fábio FR
Lopes, Fabrício M
Hashimoto, Ronaldo F
Cesar, Roberto M
Assessing the gain of biological data integration in gene networks inference
title Assessing the gain of biological data integration in gene networks inference
title_full Assessing the gain of biological data integration in gene networks inference
title_fullStr Assessing the gain of biological data integration in gene networks inference
title_full_unstemmed Assessing the gain of biological data integration in gene networks inference
title_short Assessing the gain of biological data integration in gene networks inference
title_sort assessing the gain of biological data integration in gene networks inference
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3481449/
https://www.ncbi.nlm.nih.gov/pubmed/23134775
http://dx.doi.org/10.1186/1471-2164-13-S6-S7
work_keys_str_mv AT vicentefabiofr assessingthegainofbiologicaldataintegrationingenenetworksinference
AT lopesfabriciom assessingthegainofbiologicaldataintegrationingenenetworksinference
AT hashimotoronaldof assessingthegainofbiologicaldataintegrationingenenetworksinference
AT cesarrobertom assessingthegainofbiologicaldataintegrationingenenetworksinference