Cargando…

Defining an informativeness metric for clustering gene expression data

Motivation: Unsupervised ‘cluster’ analysis is an invaluable tool for exploratory microarray data analysis, as it organizes the data into groups of genes or samples in which the elements share common patterns. Once the data are clustered, finding the optimal number of informative subgroups within a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mar, Jessica C., Wells, Christine A., Quackenbush, John
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2011
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3072547/ https://www.ncbi.nlm.nih.gov/pubmed/21330289 http://dx.doi.org/10.1093/bioinformatics/btr074

_version_	1782201574052855808
author	Mar, Jessica C. Wells, Christine A. Quackenbush, John
author_facet	Mar, Jessica C. Wells, Christine A. Quackenbush, John
author_sort	Mar, Jessica C.
collection	PubMed
description	Motivation: Unsupervised ‘cluster’ analysis is an invaluable tool for exploratory microarray data analysis, as it organizes the data into groups of genes or samples in which the elements share common patterns. Once the data are clustered, finding the optimal number of informative subgroups within a dataset is a problem that, while important for understanding the underlying phenotypes, is one for which there is no robust, widely accepted solution. Results: To address this problem we developed an ‘informativeness metric’ based on a simple analysis of variance statistic that identifies the number of clusters which best separate phenotypic groups. The performance of the informativeness metric has been tested on both experimental and simulated datasets, and we contrast these results with those obtained using alternative methods such as the gap statistic. Availability: The method has been implemented in the Bioconductor R package attract; it is also freely available from http://compbio.dfci.harvard.edu/pubs/attract_1.0.1.zip. Contact: jess@jimmy.harvard.edu; johnq@jimmy.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format	Text
id	pubmed-3072547
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-30725472011-04-11 Defining an informativeness metric for clustering gene expression data Mar, Jessica C. Wells, Christine A. Quackenbush, John Bioinformatics Original Papers Motivation: Unsupervised ‘cluster’ analysis is an invaluable tool for exploratory microarray data analysis, as it organizes the data into groups of genes or samples in which the elements share common patterns. Once the data are clustered, finding the optimal number of informative subgroups within a dataset is a problem that, while important for understanding the underlying phenotypes, is one for which there is no robust, widely accepted solution. Results: To address this problem we developed an ‘informativeness metric’ based on a simple analysis of variance statistic that identifies the number of clusters which best separate phenotypic groups. The performance of the informativeness metric has been tested on both experimental and simulated datasets, and we contrast these results with those obtained using alternative methods such as the gap statistic. Availability: The method has been implemented in the Bioconductor R package attract; it is also freely available from http://compbio.dfci.harvard.edu/pubs/attract_1.0.1.zip. Contact: jess@jimmy.harvard.edu; johnq@jimmy.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2011-04-15 2011-02-16 /pmc/articles/PMC3072547/ /pubmed/21330289 http://dx.doi.org/10.1093/bioinformatics/btr074 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Mar, Jessica C. Wells, Christine A. Quackenbush, John Defining an informativeness metric for clustering gene expression data
title	Defining an informativeness metric for clustering gene expression data
title_full	Defining an informativeness metric for clustering gene expression data
title_fullStr	Defining an informativeness metric for clustering gene expression data
title_full_unstemmed	Defining an informativeness metric for clustering gene expression data
title_short	Defining an informativeness metric for clustering gene expression data
title_sort	defining an informativeness metric for clustering gene expression data
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3072547/ https://www.ncbi.nlm.nih.gov/pubmed/21330289 http://dx.doi.org/10.1093/bioinformatics/btr074
work_keys_str_mv	AT marjessicac defininganinformativenessmetricforclusteringgeneexpressiondata AT wellschristinea defininganinformativenessmetricforclusteringgeneexpressiondata AT quackenbushjohn defininganinformativenessmetricforclusteringgeneexpressiondata

Defining an informativeness metric for clustering gene expression data

Ejemplares similares