Cargando…

Generating confidence intervals on biological networks

BACKGROUND: In the analysis of networks we frequently require the statistical significance of some network statistic, such as measures of similarity for the properties of interacting nodes. The structure of the network may introduce dependencies among the nodes and it will in general be necessary to...

Descripción completa

Detalles Bibliográficos
Autores principales: Thorne, Thomas, Stumpf, Michael PH
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2241843/
https://www.ncbi.nlm.nih.gov/pubmed/18053130
http://dx.doi.org/10.1186/1471-2105-8-467
_version_ 1782150539488788480
author Thorne, Thomas
Stumpf, Michael PH
author_facet Thorne, Thomas
Stumpf, Michael PH
author_sort Thorne, Thomas
collection PubMed
description BACKGROUND: In the analysis of networks we frequently require the statistical significance of some network statistic, such as measures of similarity for the properties of interacting nodes. The structure of the network may introduce dependencies among the nodes and it will in general be necessary to account for these dependencies in the statistical analysis. To this end we require some form of Null model of the network: generally rewired replicates of the network are generated which preserve only the degree (number of interactions) of each node. We show that this can fail to capture important features of network structure, and may result in unrealistic significance levels, when potentially confounding additional information is available. METHODS: We present a new network resampling Null model which takes into account the degree sequence as well as available biological annotations. Using gene ontology information as an illustration we show how this information can be accounted for in the resampling approach, and the impact such information has on the assessment of statistical significance of correlations and motif-abundances in the Saccharomyces cerevisiae protein interaction network. An algorithm, GOcardShuffle, is introduced to allow for the efficient construction of an improved Null model for network data. RESULTS: We use the protein interaction network of S. cerevisiae; correlations between the evolutionary rates and expression levels of interacting proteins and their statistical significance were assessed for Null models which condition on different aspects of the available data. The novel GOcardShuffle approach results in a Null model for annotated network data which appears better to describe the properties of real biological networks. CONCLUSION: An improved statistical approach for the statistical analysis of biological network data, which conditions on the available biological information, leads to qualitatively different results compared to approaches which ignore such annotations. In particular we demonstrate the effects of the biological organization of the network can be sufficient to explain the observed similarity of interacting proteins.
format Text
id pubmed-2241843
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22418432008-02-14 Generating confidence intervals on biological networks Thorne, Thomas Stumpf, Michael PH BMC Bioinformatics Methodology Article BACKGROUND: In the analysis of networks we frequently require the statistical significance of some network statistic, such as measures of similarity for the properties of interacting nodes. The structure of the network may introduce dependencies among the nodes and it will in general be necessary to account for these dependencies in the statistical analysis. To this end we require some form of Null model of the network: generally rewired replicates of the network are generated which preserve only the degree (number of interactions) of each node. We show that this can fail to capture important features of network structure, and may result in unrealistic significance levels, when potentially confounding additional information is available. METHODS: We present a new network resampling Null model which takes into account the degree sequence as well as available biological annotations. Using gene ontology information as an illustration we show how this information can be accounted for in the resampling approach, and the impact such information has on the assessment of statistical significance of correlations and motif-abundances in the Saccharomyces cerevisiae protein interaction network. An algorithm, GOcardShuffle, is introduced to allow for the efficient construction of an improved Null model for network data. RESULTS: We use the protein interaction network of S. cerevisiae; correlations between the evolutionary rates and expression levels of interacting proteins and their statistical significance were assessed for Null models which condition on different aspects of the available data. The novel GOcardShuffle approach results in a Null model for annotated network data which appears better to describe the properties of real biological networks. CONCLUSION: An improved statistical approach for the statistical analysis of biological network data, which conditions on the available biological information, leads to qualitatively different results compared to approaches which ignore such annotations. In particular we demonstrate the effects of the biological organization of the network can be sufficient to explain the observed similarity of interacting proteins. BioMed Central 2007-11-30 /pmc/articles/PMC2241843/ /pubmed/18053130 http://dx.doi.org/10.1186/1471-2105-8-467 Text en Copyright © 2007 Thorne and Stumpf; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Thorne, Thomas
Stumpf, Michael PH
Generating confidence intervals on biological networks
title Generating confidence intervals on biological networks
title_full Generating confidence intervals on biological networks
title_fullStr Generating confidence intervals on biological networks
title_full_unstemmed Generating confidence intervals on biological networks
title_short Generating confidence intervals on biological networks
title_sort generating confidence intervals on biological networks
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2241843/
https://www.ncbi.nlm.nih.gov/pubmed/18053130
http://dx.doi.org/10.1186/1471-2105-8-467
work_keys_str_mv AT thornethomas generatingconfidenceintervalsonbiologicalnetworks
AT stumpfmichaelph generatingconfidenceintervalsonbiologicalnetworks