Cargando…

Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data

BACKGROUND: Researchers using RNA expression microarrays in experimental designs with more than two treatment groups often identify statistically significant genes with ANOVA approaches. However, the ANOVA test does not discriminate which of the multiple treatment groups differ from one another. Thu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hulshizer, Randall, Blalock, Eric M
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2007
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1934919/ https://www.ncbi.nlm.nih.gov/pubmed/17615071 http://dx.doi.org/10.1186/1471-2105-8-240

_version_	1782134355596935168
author	Hulshizer, Randall Blalock, Eric M
author_facet	Hulshizer, Randall Blalock, Eric M
author_sort	Hulshizer, Randall
collection	PubMed
description	BACKGROUND: Researchers using RNA expression microarrays in experimental designs with more than two treatment groups often identify statistically significant genes with ANOVA approaches. However, the ANOVA test does not discriminate which of the multiple treatment groups differ from one another. Thus, post hoc tests, such as linear contrasts, template correlations, and pairwise comparisons are used. Linear contrasts and template correlations work extremely well, especially when the researcher has a priori information pointing to a particular pattern/template among the different treatment groups. Further, all pairwise comparisons can be used to identify particular, treatment group-dependent patterns of gene expression. However, these approaches are biased by the researcher's assumptions, and some treatment-based patterns may fail to be detected using these approaches. Finally, different patterns may have different probabilities of occurring by chance, importantly influencing researchers' conclusions about a pattern and its constituent genes. RESULTS: We developed a four step, post hoc pattern matching (PPM) algorithm to automate single channel gene expression pattern identification/significance. First, 1-Way Analysis of Variance (ANOVA), coupled with post hoc 'all pairwise' comparisons are calculated for all genes. Second, for each ANOVA-significant gene, all pairwise contrast results are encoded to create unique pattern ID numbers. The # genes found in each pattern in the data is identified as that pattern's 'actual' frequency. Third, using Monte Carlo simulations, those patterns' frequencies are estimated in random data ('random' gene pattern frequency). Fourth, a Z-score for overrepresentation of the pattern is calculated ('actual' against 'random' gene pattern frequencies). We wrote a Visual Basic program (StatiGen) that automates PPM procedure, constructs an Excel workbook with standardized graphs of overrepresented patterns, and lists of the genes comprising each pattern. The visual basic code, installation files for StatiGen, and sample data are available as supplementary material. CONCLUSION: The PPM procedure is designed to augment current microarray analysis procedures by allowing researchers to incorporate all of the information from post hoc tests to establish unique, overarching gene expression patterns in which there is no overlap in gene membership. In our hands, PPM works well for studies using from three to six treatment groups in which the researcher is interested in treatment-related patterns of gene expression. Hardware/software limitations and extreme number of theoretical expression patterns limit utility for larger numbers of treatment groups. Applied to a published microarray experiment, the StatiGen program successfully flagged patterns that had been manually assigned in prior work, and further identified other gene expression patterns that may be of interest. Thus, over a moderate range of treatment groups, PPM appears to work well. It allows researchers to assign statistical probabilities to patterns of gene expression that fit a priori expectations/hypotheses, it preserves the data's ability to show the researcher interesting, yet unanticipated gene expression patterns, and assigns the majority of ANOVA-significant genes to non-overlapping patterns.
format	Text
id	pubmed-1934919
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-19349192007-07-31 Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data Hulshizer, Randall Blalock, Eric M BMC Bioinformatics Methodology Article BACKGROUND: Researchers using RNA expression microarrays in experimental designs with more than two treatment groups often identify statistically significant genes with ANOVA approaches. However, the ANOVA test does not discriminate which of the multiple treatment groups differ from one another. Thus, post hoc tests, such as linear contrasts, template correlations, and pairwise comparisons are used. Linear contrasts and template correlations work extremely well, especially when the researcher has a priori information pointing to a particular pattern/template among the different treatment groups. Further, all pairwise comparisons can be used to identify particular, treatment group-dependent patterns of gene expression. However, these approaches are biased by the researcher's assumptions, and some treatment-based patterns may fail to be detected using these approaches. Finally, different patterns may have different probabilities of occurring by chance, importantly influencing researchers' conclusions about a pattern and its constituent genes. RESULTS: We developed a four step, post hoc pattern matching (PPM) algorithm to automate single channel gene expression pattern identification/significance. First, 1-Way Analysis of Variance (ANOVA), coupled with post hoc 'all pairwise' comparisons are calculated for all genes. Second, for each ANOVA-significant gene, all pairwise contrast results are encoded to create unique pattern ID numbers. The # genes found in each pattern in the data is identified as that pattern's 'actual' frequency. Third, using Monte Carlo simulations, those patterns' frequencies are estimated in random data ('random' gene pattern frequency). Fourth, a Z-score for overrepresentation of the pattern is calculated ('actual' against 'random' gene pattern frequencies). We wrote a Visual Basic program (StatiGen) that automates PPM procedure, constructs an Excel workbook with standardized graphs of overrepresented patterns, and lists of the genes comprising each pattern. The visual basic code, installation files for StatiGen, and sample data are available as supplementary material. CONCLUSION: The PPM procedure is designed to augment current microarray analysis procedures by allowing researchers to incorporate all of the information from post hoc tests to establish unique, overarching gene expression patterns in which there is no overlap in gene membership. In our hands, PPM works well for studies using from three to six treatment groups in which the researcher is interested in treatment-related patterns of gene expression. Hardware/software limitations and extreme number of theoretical expression patterns limit utility for larger numbers of treatment groups. Applied to a published microarray experiment, the StatiGen program successfully flagged patterns that had been manually assigned in prior work, and further identified other gene expression patterns that may be of interest. Thus, over a moderate range of treatment groups, PPM appears to work well. It allows researchers to assign statistical probabilities to patterns of gene expression that fit a priori expectations/hypotheses, it preserves the data's ability to show the researcher interesting, yet unanticipated gene expression patterns, and assigns the majority of ANOVA-significant genes to non-overlapping patterns. BioMed Central 2007-07-05 /pmc/articles/PMC1934919/ /pubmed/17615071 http://dx.doi.org/10.1186/1471-2105-8-240 Text en Copyright © 2007 Hulshizer and Blalock; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Hulshizer, Randall Blalock, Eric M Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data
title	Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data
title_full	Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data
title_fullStr	Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data
title_full_unstemmed	Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data
title_short	Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data
title_sort	post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1934919/ https://www.ncbi.nlm.nih.gov/pubmed/17615071 http://dx.doi.org/10.1186/1471-2105-8-240
work_keys_str_mv	AT hulshizerrandall posthocpatternmatchingassigningsignificancetostatisticallydefinedexpressionpatternsinsinglechannelmicroarraydata AT blalockericm posthocpatternmatchingassigningsignificancetostatisticallydefinedexpressionpatternsinsinglechannelmicroarraydata

Post hoc pattern matching: assigning significance to statistically defined expression patterns in single channel microarray data

Ejemplares similares