Cargando…

Statistical significance for hierarchical clustering in genetic association and microarray expression studies

BACKGROUND: With the increasing amount of data generated in molecular genetics laboratories, it is often difficult to make sense of results because of the vast number of different outcomes or variables studied. Examples include expression levels for large numbers of genes and haplotypes at large num...

Descripción completa

Detalles Bibliográficos
Autores principales:	Levenstien, Mark A, Yang, Yaning, Ott, Jürg
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2003
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC328091/ https://www.ncbi.nlm.nih.gov/pubmed/14667254 http://dx.doi.org/10.1186/1471-2105-4-62

_version_	1782121180971401216
author	Levenstien, Mark A Yang, Yaning Ott, Jürg
author_facet	Levenstien, Mark A Yang, Yaning Ott, Jürg
author_sort	Levenstien, Mark A
collection	PubMed
description	BACKGROUND: With the increasing amount of data generated in molecular genetics laboratories, it is often difficult to make sense of results because of the vast number of different outcomes or variables studied. Examples include expression levels for large numbers of genes and haplotypes at large numbers of loci. It is then natural to group observations into smaller numbers of classes that allow for an easier overview and interpretation of the data. This grouping is often carried out in multiple steps with the aid of hierarchical cluster analysis, each step leading to a smaller number of classes by combining similar observations or classes. At each step, either implicitly or explicitly, researchers tend to interpret results and eventually focus on that set of classes providing the "best" (most significant) result. While this approach makes sense, the overall statistical significance of the experiment must include the clustering process, which modifies the grouping structure of the data and often removes variation. RESULTS: For hierarchically clustered data, we propose considering the strongest result or, equivalently, the smallest p-value as the experiment-wise statistic of interest and evaluating its significance level for a global assessment of statistical significance. We apply our approach to datasets from haplotype association and microarray expression studies where hierarchical clustering has been used. CONCLUSION: In all of the cases we examine, we find that relying on one set of classes in the course of clustering leads to significance levels that are too small when compared with the significance level associated with an overall statistic that incorporates the process of clustering. In other words, relying on one step of clustering may furnish a formally significant result while the overall experiment is not significant.
format	Text
id	pubmed-328091
institution	National Center for Biotechnology Information
language	English
publishDate	2003
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-3280912004-02-05 Statistical significance for hierarchical clustering in genetic association and microarray expression studies Levenstien, Mark A Yang, Yaning Ott, Jürg BMC Bioinformatics Methodology Article BACKGROUND: With the increasing amount of data generated in molecular genetics laboratories, it is often difficult to make sense of results because of the vast number of different outcomes or variables studied. Examples include expression levels for large numbers of genes and haplotypes at large numbers of loci. It is then natural to group observations into smaller numbers of classes that allow for an easier overview and interpretation of the data. This grouping is often carried out in multiple steps with the aid of hierarchical cluster analysis, each step leading to a smaller number of classes by combining similar observations or classes. At each step, either implicitly or explicitly, researchers tend to interpret results and eventually focus on that set of classes providing the "best" (most significant) result. While this approach makes sense, the overall statistical significance of the experiment must include the clustering process, which modifies the grouping structure of the data and often removes variation. RESULTS: For hierarchically clustered data, we propose considering the strongest result or, equivalently, the smallest p-value as the experiment-wise statistic of interest and evaluating its significance level for a global assessment of statistical significance. We apply our approach to datasets from haplotype association and microarray expression studies where hierarchical clustering has been used. CONCLUSION: In all of the cases we examine, we find that relying on one set of classes in the course of clustering leads to significance levels that are too small when compared with the significance level associated with an overall statistic that incorporates the process of clustering. In other words, relying on one step of clustering may furnish a formally significant result while the overall experiment is not significant. BioMed Central 2003-12-11 /pmc/articles/PMC328091/ /pubmed/14667254 http://dx.doi.org/10.1186/1471-2105-4-62 Text en Copyright © 2003 Levenstien et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle	Methodology Article Levenstien, Mark A Yang, Yaning Ott, Jürg Statistical significance for hierarchical clustering in genetic association and microarray expression studies
title	Statistical significance for hierarchical clustering in genetic association and microarray expression studies
title_full	Statistical significance for hierarchical clustering in genetic association and microarray expression studies
title_fullStr	Statistical significance for hierarchical clustering in genetic association and microarray expression studies
title_full_unstemmed	Statistical significance for hierarchical clustering in genetic association and microarray expression studies
title_short	Statistical significance for hierarchical clustering in genetic association and microarray expression studies
title_sort	statistical significance for hierarchical clustering in genetic association and microarray expression studies
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC328091/ https://www.ncbi.nlm.nih.gov/pubmed/14667254 http://dx.doi.org/10.1186/1471-2105-4-62
work_keys_str_mv	AT levenstienmarka statisticalsignificanceforhierarchicalclusteringingeneticassociationandmicroarrayexpressionstudies AT yangyaning statisticalsignificanceforhierarchicalclusteringingeneticassociationandmicroarrayexpressionstudies AT ottjurg statisticalsignificanceforhierarchicalclusteringingeneticassociationandmicroarrayexpressionstudies

Statistical significance for hierarchical clustering in genetic association and microarray expression studies

Ejemplares similares