Cargando…

CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis

With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test wh...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Hong-Qiang, Tsai, Chung-Jui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3806744/
https://www.ncbi.nlm.nih.gov/pubmed/24194884
http://dx.doi.org/10.1371/journal.pone.0077429
_version_ 1782288422951452672
author Wang, Hong-Qiang
Tsai, Chung-Jui
author_facet Wang, Hong-Qiang
Tsai, Chung-Jui
author_sort Wang, Hong-Qiang
collection PubMed
description With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test whether an observed association is reliable both statistically and biologically. We present here a new method, CorSig, for statistical inference of correlation significance. CorSig is based on a biology-informed null hypothesis, i.e., testing whether the true PCC (ρ) between two variables is statistically larger than a user-specified PCC cutoff (τ), as opposed to the simple null hypothesis of ρ = 0 in existing methods, i.e., testing whether an association can be declared without a threshold. CorSig incorporates Fisher's Z transformation of the observed PCC (r), which facilitates use of standard techniques for p-value computation and multiple testing corrections. We compared CorSig against two methods: one uses a minimum PCC cutoff while the other (Zhu's procedure) controls correlation strength and statistical significance in two discrete steps. CorSig consistently outperformed these methods in various simulation data scenarios by balancing between false positives and false negatives. When tested on real-world Populus microarray data, CorSig effectively identified co-expressed genes in the flavonoid pathway, and discriminated between closely related gene family members for their differential association with flavonoid and lignin pathways. The p-values obtained by CorSig can be used as a stand-alone parameter for stratification of co-expressed genes according to their correlation strength in lieu of an arbitrary cutoff. CorSig requires one single tunable parameter, and can be readily extended to other correlation measures. Thus, CorSig should be useful for a wide range of applications, particularly for network analysis of high-dimensional genomic data. SOFTWARE AVAILABILITY: A web server for CorSig is provided at http://202.127.200.1:8080/probeWeb. R code for CorSig is freely available for non-commercial use at http://aspendb.uga.edu/downloads.
format Online
Article
Text
id pubmed-3806744
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-38067442013-11-05 CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis Wang, Hong-Qiang Tsai, Chung-Jui PLoS One Research Article With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test whether an observed association is reliable both statistically and biologically. We present here a new method, CorSig, for statistical inference of correlation significance. CorSig is based on a biology-informed null hypothesis, i.e., testing whether the true PCC (ρ) between two variables is statistically larger than a user-specified PCC cutoff (τ), as opposed to the simple null hypothesis of ρ = 0 in existing methods, i.e., testing whether an association can be declared without a threshold. CorSig incorporates Fisher's Z transformation of the observed PCC (r), which facilitates use of standard techniques for p-value computation and multiple testing corrections. We compared CorSig against two methods: one uses a minimum PCC cutoff while the other (Zhu's procedure) controls correlation strength and statistical significance in two discrete steps. CorSig consistently outperformed these methods in various simulation data scenarios by balancing between false positives and false negatives. When tested on real-world Populus microarray data, CorSig effectively identified co-expressed genes in the flavonoid pathway, and discriminated between closely related gene family members for their differential association with flavonoid and lignin pathways. The p-values obtained by CorSig can be used as a stand-alone parameter for stratification of co-expressed genes according to their correlation strength in lieu of an arbitrary cutoff. CorSig requires one single tunable parameter, and can be readily extended to other correlation measures. Thus, CorSig should be useful for a wide range of applications, particularly for network analysis of high-dimensional genomic data. SOFTWARE AVAILABILITY: A web server for CorSig is provided at http://202.127.200.1:8080/probeWeb. R code for CorSig is freely available for non-commercial use at http://aspendb.uga.edu/downloads. Public Library of Science 2013-10-23 /pmc/articles/PMC3806744/ /pubmed/24194884 http://dx.doi.org/10.1371/journal.pone.0077429 Text en © 2013 Wang, Tsai http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Wang, Hong-Qiang
Tsai, Chung-Jui
CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis
title CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis
title_full CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis
title_fullStr CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis
title_full_unstemmed CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis
title_short CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis
title_sort corsig: a general framework for estimating statistical significance of correlation and its application to gene co-expression analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3806744/
https://www.ncbi.nlm.nih.gov/pubmed/24194884
http://dx.doi.org/10.1371/journal.pone.0077429
work_keys_str_mv AT wanghongqiang corsigageneralframeworkforestimatingstatisticalsignificanceofcorrelationanditsapplicationtogenecoexpressionanalysis
AT tsaichungjui corsigageneralframeworkforestimatingstatisticalsignificanceofcorrelationanditsapplicationtogenecoexpressionanalysis