Cargando…

From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data

BACKGROUND: The use of correlation networks is widespread in the analysis of gene expression and proteomics data, even though it is known that correlations not only confound direct and indirect associations but also provide no means to distinguish between cause and effect. For "causal" ana...

Descripción completa

Detalles Bibliográficos
Autores principales: Opgen-Rhein, Rainer, Strimmer, Korbinian
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1995222/
https://www.ncbi.nlm.nih.gov/pubmed/17683609
http://dx.doi.org/10.1186/1752-0509-1-37
_version_ 1782135517088841728
author Opgen-Rhein, Rainer
Strimmer, Korbinian
author_facet Opgen-Rhein, Rainer
Strimmer, Korbinian
author_sort Opgen-Rhein, Rainer
collection PubMed
description BACKGROUND: The use of correlation networks is widespread in the analysis of gene expression and proteomics data, even though it is known that correlations not only confound direct and indirect associations but also provide no means to distinguish between cause and effect. For "causal" analysis typically the inference of a directed graphical model is required. However, this is rather difficult due to the curse of dimensionality. RESULTS: We propose a simple heuristic for the statistical learning of a high-dimensional "causal" network. The method first converts a correlation network into a partial correlation graph. Subsequently, a partial ordering of the nodes is established by multiple testing of the log-ratio of standardized partial variances. This allows identifying a directed acyclic causal network as a subgraph of the partial correlation network. We illustrate the approach by analyzing a large Arabidopsis thaliana expression data set. CONCLUSION: The proposed approach is a heuristic algorithm that is based on a number of approximations, such as substituting lower order partial correlations by full order partial correlations. Nevertheless, for small samples and for sparse networks the algorithm not only yield sensible first order approximations of the causal structure in high-dimensional genomic data but is also computationally highly efficient. AVAILABILITY AND REQUIREMENTS: The method is implemented in the "GeneNet" R package (version 1.2.0), available from CRAN and from . The software includes an R script for reproducing the network analysis of the Arabidopsis thaliana data.
format Text
id pubmed-1995222
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-19952222007-09-29 From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data Opgen-Rhein, Rainer Strimmer, Korbinian BMC Syst Biol Methodology Article BACKGROUND: The use of correlation networks is widespread in the analysis of gene expression and proteomics data, even though it is known that correlations not only confound direct and indirect associations but also provide no means to distinguish between cause and effect. For "causal" analysis typically the inference of a directed graphical model is required. However, this is rather difficult due to the curse of dimensionality. RESULTS: We propose a simple heuristic for the statistical learning of a high-dimensional "causal" network. The method first converts a correlation network into a partial correlation graph. Subsequently, a partial ordering of the nodes is established by multiple testing of the log-ratio of standardized partial variances. This allows identifying a directed acyclic causal network as a subgraph of the partial correlation network. We illustrate the approach by analyzing a large Arabidopsis thaliana expression data set. CONCLUSION: The proposed approach is a heuristic algorithm that is based on a number of approximations, such as substituting lower order partial correlations by full order partial correlations. Nevertheless, for small samples and for sparse networks the algorithm not only yield sensible first order approximations of the causal structure in high-dimensional genomic data but is also computationally highly efficient. AVAILABILITY AND REQUIREMENTS: The method is implemented in the "GeneNet" R package (version 1.2.0), available from CRAN and from . The software includes an R script for reproducing the network analysis of the Arabidopsis thaliana data. BioMed Central 2007-08-06 /pmc/articles/PMC1995222/ /pubmed/17683609 http://dx.doi.org/10.1186/1752-0509-1-37 Text en Copyright © 2007 Opgen-Rhein and Strimmer; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Opgen-Rhein, Rainer
Strimmer, Korbinian
From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data
title From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data
title_full From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data
title_fullStr From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data
title_full_unstemmed From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data
title_short From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data
title_sort from correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1995222/
https://www.ncbi.nlm.nih.gov/pubmed/17683609
http://dx.doi.org/10.1186/1752-0509-1-37
work_keys_str_mv AT opgenrheinrainer fromcorrelationtocausationnetworksasimpleapproximatelearningalgorithmanditsapplicationtohighdimensionalplantgeneexpressiondata
AT strimmerkorbinian fromcorrelationtocausationnetworksasimpleapproximatelearningalgorithmanditsapplicationtohighdimensionalplantgeneexpressiondata