Cargando…

Assessing statistical significance in causal graphs

BACKGROUND: Causal graphs are an increasingly popular tool for the analysis of biological datasets. In particular, signed causal graphs--directed graphs whose edges additionally have a sign denoting upregulation or downregulation--can be used to model regulatory networks within a cell. Such models a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chindelevitch, Leonid, Loh, Po-Ru, Enayetallah, Ahmed, Berger, Bonnie, Ziemek, Daniel
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3307026/ https://www.ncbi.nlm.nih.gov/pubmed/22348444 http://dx.doi.org/10.1186/1471-2105-13-35

_version_	1782227277628571648
author	Chindelevitch, Leonid Loh, Po-Ru Enayetallah, Ahmed Berger, Bonnie Ziemek, Daniel
author_facet	Chindelevitch, Leonid Loh, Po-Ru Enayetallah, Ahmed Berger, Bonnie Ziemek, Daniel
author_sort	Chindelevitch, Leonid
collection	PubMed
description	BACKGROUND: Causal graphs are an increasingly popular tool for the analysis of biological datasets. In particular, signed causal graphs--directed graphs whose edges additionally have a sign denoting upregulation or downregulation--can be used to model regulatory networks within a cell. Such models allow prediction of downstream effects of regulation of biological entities; conversely, they also enable inference of causative agents behind observed expression changes. However, due to their complex nature, signed causal graph models present special challenges with respect to assessing statistical significance. In this paper we frame and solve two fundamental computational problems that arise in practice when computing appropriate null distributions for hypothesis testing. RESULTS: First, we show how to compute a p-value for agreement between observed and model-predicted classifications of gene transcripts as upregulated, downregulated, or neither. Specifically, how likely are the classifications to agree to the same extent under the null distribution of the observed classification being randomized? This problem, which we call "Ternary Dot Product Distribution" owing to its mathematical form, can be viewed as a generalization of Fisher's exact test to ternary variables. We present two computationally efficient algorithms for computing the Ternary Dot Product Distribution and investigate its combinatorial structure analytically and numerically to establish computational complexity bounds. Second, we develop an algorithm for efficiently performing random sampling of causal graphs. This enables p-value computation under a different, equally important null distribution obtained by randomizing the graph topology but keeping fixed its basic structure: connectedness and the positive and negative in- and out-degrees of each vertex. We provide an algorithm for sampling a graph from this distribution uniformly at random. We also highlight theoretical challenges unique to signed causal graphs; previous work on graph randomization has studied undirected graphs and directed but unsigned graphs. CONCLUSION: We present algorithmic solutions to two statistical significance questions necessary to apply the causal graph methodology, a powerful tool for biological network analysis. The algorithms we present are both fast and provably correct. Our work may be of independent interest in non-biological contexts as well, as it generalizes mathematical results that have been studied extensively in other fields.
format	Online Article Text
id	pubmed-3307026
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-33070262012-03-19 Assessing statistical significance in causal graphs Chindelevitch, Leonid Loh, Po-Ru Enayetallah, Ahmed Berger, Bonnie Ziemek, Daniel BMC Bioinformatics Methodology Article BACKGROUND: Causal graphs are an increasingly popular tool for the analysis of biological datasets. In particular, signed causal graphs--directed graphs whose edges additionally have a sign denoting upregulation or downregulation--can be used to model regulatory networks within a cell. Such models allow prediction of downstream effects of regulation of biological entities; conversely, they also enable inference of causative agents behind observed expression changes. However, due to their complex nature, signed causal graph models present special challenges with respect to assessing statistical significance. In this paper we frame and solve two fundamental computational problems that arise in practice when computing appropriate null distributions for hypothesis testing. RESULTS: First, we show how to compute a p-value for agreement between observed and model-predicted classifications of gene transcripts as upregulated, downregulated, or neither. Specifically, how likely are the classifications to agree to the same extent under the null distribution of the observed classification being randomized? This problem, which we call "Ternary Dot Product Distribution" owing to its mathematical form, can be viewed as a generalization of Fisher's exact test to ternary variables. We present two computationally efficient algorithms for computing the Ternary Dot Product Distribution and investigate its combinatorial structure analytically and numerically to establish computational complexity bounds. Second, we develop an algorithm for efficiently performing random sampling of causal graphs. This enables p-value computation under a different, equally important null distribution obtained by randomizing the graph topology but keeping fixed its basic structure: connectedness and the positive and negative in- and out-degrees of each vertex. We provide an algorithm for sampling a graph from this distribution uniformly at random. We also highlight theoretical challenges unique to signed causal graphs; previous work on graph randomization has studied undirected graphs and directed but unsigned graphs. CONCLUSION: We present algorithmic solutions to two statistical significance questions necessary to apply the causal graph methodology, a powerful tool for biological network analysis. The algorithms we present are both fast and provably correct. Our work may be of independent interest in non-biological contexts as well, as it generalizes mathematical results that have been studied extensively in other fields. BioMed Central 2012-02-20 /pmc/articles/PMC3307026/ /pubmed/22348444 http://dx.doi.org/10.1186/1471-2105-13-35 Text en Copyright ©2012 Chindelevitch et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Chindelevitch, Leonid Loh, Po-Ru Enayetallah, Ahmed Berger, Bonnie Ziemek, Daniel Assessing statistical significance in causal graphs
title	Assessing statistical significance in causal graphs
title_full	Assessing statistical significance in causal graphs
title_fullStr	Assessing statistical significance in causal graphs
title_full_unstemmed	Assessing statistical significance in causal graphs
title_short	Assessing statistical significance in causal graphs
title_sort	assessing statistical significance in causal graphs
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3307026/ https://www.ncbi.nlm.nih.gov/pubmed/22348444 http://dx.doi.org/10.1186/1471-2105-13-35
work_keys_str_mv	AT chindelevitchleonid assessingstatisticalsignificanceincausalgraphs AT lohporu assessingstatisticalsignificanceincausalgraphs AT enayetallahahmed assessingstatisticalsignificanceincausalgraphs AT bergerbonnie assessingstatisticalsignificanceincausalgraphs AT ziemekdaniel assessingstatisticalsignificanceincausalgraphs

Assessing statistical significance in causal graphs

Ejemplares similares