Cargando…
CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis
With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test wh...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3806744/ https://www.ncbi.nlm.nih.gov/pubmed/24194884 http://dx.doi.org/10.1371/journal.pone.0077429 |
_version_ | 1782288422951452672 |
---|---|
author | Wang, Hong-Qiang Tsai, Chung-Jui |
author_facet | Wang, Hong-Qiang Tsai, Chung-Jui |
author_sort | Wang, Hong-Qiang |
collection | PubMed |
description | With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test whether an observed association is reliable both statistically and biologically. We present here a new method, CorSig, for statistical inference of correlation significance. CorSig is based on a biology-informed null hypothesis, i.e., testing whether the true PCC (ρ) between two variables is statistically larger than a user-specified PCC cutoff (τ), as opposed to the simple null hypothesis of ρ = 0 in existing methods, i.e., testing whether an association can be declared without a threshold. CorSig incorporates Fisher's Z transformation of the observed PCC (r), which facilitates use of standard techniques for p-value computation and multiple testing corrections. We compared CorSig against two methods: one uses a minimum PCC cutoff while the other (Zhu's procedure) controls correlation strength and statistical significance in two discrete steps. CorSig consistently outperformed these methods in various simulation data scenarios by balancing between false positives and false negatives. When tested on real-world Populus microarray data, CorSig effectively identified co-expressed genes in the flavonoid pathway, and discriminated between closely related gene family members for their differential association with flavonoid and lignin pathways. The p-values obtained by CorSig can be used as a stand-alone parameter for stratification of co-expressed genes according to their correlation strength in lieu of an arbitrary cutoff. CorSig requires one single tunable parameter, and can be readily extended to other correlation measures. Thus, CorSig should be useful for a wide range of applications, particularly for network analysis of high-dimensional genomic data. SOFTWARE AVAILABILITY: A web server for CorSig is provided at http://202.127.200.1:8080/probeWeb. R code for CorSig is freely available for non-commercial use at http://aspendb.uga.edu/downloads. |
format | Online Article Text |
id | pubmed-3806744 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-38067442013-11-05 CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis Wang, Hong-Qiang Tsai, Chung-Jui PLoS One Research Article With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test whether an observed association is reliable both statistically and biologically. We present here a new method, CorSig, for statistical inference of correlation significance. CorSig is based on a biology-informed null hypothesis, i.e., testing whether the true PCC (ρ) between two variables is statistically larger than a user-specified PCC cutoff (τ), as opposed to the simple null hypothesis of ρ = 0 in existing methods, i.e., testing whether an association can be declared without a threshold. CorSig incorporates Fisher's Z transformation of the observed PCC (r), which facilitates use of standard techniques for p-value computation and multiple testing corrections. We compared CorSig against two methods: one uses a minimum PCC cutoff while the other (Zhu's procedure) controls correlation strength and statistical significance in two discrete steps. CorSig consistently outperformed these methods in various simulation data scenarios by balancing between false positives and false negatives. When tested on real-world Populus microarray data, CorSig effectively identified co-expressed genes in the flavonoid pathway, and discriminated between closely related gene family members for their differential association with flavonoid and lignin pathways. The p-values obtained by CorSig can be used as a stand-alone parameter for stratification of co-expressed genes according to their correlation strength in lieu of an arbitrary cutoff. CorSig requires one single tunable parameter, and can be readily extended to other correlation measures. Thus, CorSig should be useful for a wide range of applications, particularly for network analysis of high-dimensional genomic data. SOFTWARE AVAILABILITY: A web server for CorSig is provided at http://202.127.200.1:8080/probeWeb. R code for CorSig is freely available for non-commercial use at http://aspendb.uga.edu/downloads. Public Library of Science 2013-10-23 /pmc/articles/PMC3806744/ /pubmed/24194884 http://dx.doi.org/10.1371/journal.pone.0077429 Text en © 2013 Wang, Tsai http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Wang, Hong-Qiang Tsai, Chung-Jui CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis |
title | CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis |
title_full | CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis |
title_fullStr | CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis |
title_full_unstemmed | CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis |
title_short | CorSig: A General Framework for Estimating Statistical Significance of Correlation and Its Application to Gene Co-Expression Analysis |
title_sort | corsig: a general framework for estimating statistical significance of correlation and its application to gene co-expression analysis |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3806744/ https://www.ncbi.nlm.nih.gov/pubmed/24194884 http://dx.doi.org/10.1371/journal.pone.0077429 |
work_keys_str_mv | AT wanghongqiang corsigageneralframeworkforestimatingstatisticalsignificanceofcorrelationanditsapplicationtogenecoexpressionanalysis AT tsaichungjui corsigageneralframeworkforestimatingstatisticalsignificanceofcorrelationanditsapplicationtogenecoexpressionanalysis |