Cargando…

Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient

BACKGROUND: Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlat...

Descripción completa

Detalles Bibliográficos
Autores principales: Yao, Jianchao, Chang, Chunqi, Salmi, Mari L, Hung, Yeung Sam, Loraine, Ann, Roux, Stanley J
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2459189/
https://www.ncbi.nlm.nih.gov/pubmed/18564431
http://dx.doi.org/10.1186/1471-2105-9-288
_version_ 1782157411610525696
author Yao, Jianchao
Chang, Chunqi
Salmi, Mari L
Hung, Yeung Sam
Loraine, Ann
Roux, Stanley J
author_facet Yao, Jianchao
Chang, Chunqi
Salmi, Mari L
Hung, Yeung Sam
Loraine, Ann
Roux, Stanley J
author_sort Yao, Jianchao
collection PubMed
description BACKGROUND: Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. RESULTS: In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. CONCLUSION: This study shows that SCC is an alternative to the Pearson correlation coefficient and the SD-weighted correlation coefficient, and is particularly useful for clustering replicated microarray data. This computational approach should be generally useful for proteomic data or other high-throughput analysis methodology.
format Text
id pubmed-2459189
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-24591892008-07-14 Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient Yao, Jianchao Chang, Chunqi Salmi, Mari L Hung, Yeung Sam Loraine, Ann Roux, Stanley J BMC Bioinformatics Research Article BACKGROUND: Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. RESULTS: In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. CONCLUSION: This study shows that SCC is an alternative to the Pearson correlation coefficient and the SD-weighted correlation coefficient, and is particularly useful for clustering replicated microarray data. This computational approach should be generally useful for proteomic data or other high-throughput analysis methodology. BioMed Central 2008-06-18 /pmc/articles/PMC2459189/ /pubmed/18564431 http://dx.doi.org/10.1186/1471-2105-9-288 Text en Copyright © 2008 Yao et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Yao, Jianchao
Chang, Chunqi
Salmi, Mari L
Hung, Yeung Sam
Loraine, Ann
Roux, Stanley J
Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient
title Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient
title_full Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient
title_fullStr Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient
title_full_unstemmed Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient
title_short Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient
title_sort genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2459189/
https://www.ncbi.nlm.nih.gov/pubmed/18564431
http://dx.doi.org/10.1186/1471-2105-9-288
work_keys_str_mv AT yaojianchao genomescaleclusteranalysisofreplicatedmicroarraysusingshrinkagecorrelationcoefficient
AT changchunqi genomescaleclusteranalysisofreplicatedmicroarraysusingshrinkagecorrelationcoefficient
AT salmimaril genomescaleclusteranalysisofreplicatedmicroarraysusingshrinkagecorrelationcoefficient
AT hungyeungsam genomescaleclusteranalysisofreplicatedmicroarraysusingshrinkagecorrelationcoefficient
AT loraineann genomescaleclusteranalysisofreplicatedmicroarraysusingshrinkagecorrelationcoefficient
AT rouxstanleyj genomescaleclusteranalysisofreplicatedmicroarraysusingshrinkagecorrelationcoefficient