Cargando…

A prediction-based resampling method for estimating the number of clusters in a dataset

BACKGROUND: Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems, such as the classification of tumors. An important statistical problem associated with tumor classification is the identification of new tumor classes using gene-ex...

Descripción completa

Detalles Bibliográficos
Autores principales: Dudoit, Sandrine, Fridlyand, Jane
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2002
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC126241/
https://www.ncbi.nlm.nih.gov/pubmed/12184810
_version_ 1782120325105844224
author Dudoit, Sandrine
Fridlyand, Jane
author_facet Dudoit, Sandrine
Fridlyand, Jane
author_sort Dudoit, Sandrine
collection PubMed
description BACKGROUND: Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems, such as the classification of tumors. An important statistical problem associated with tumor classification is the identification of new tumor classes using gene-expression profiles. Two essential aspects of this clustering problem are: to estimate the number of clusters, if any, in a dataset; and to allocate tumor samples to these clusters, and assess the confidence of cluster assignments for individual samples. Here we address the first of these problems. RESULTS: We have developed a new prediction-based resampling method, Clest, to estimate the number of clusters in a dataset. The performance of the new and existing methods were compared using simulated data and gene-expression data from four recently published cancer microarray studies. Clest was generally found to be more accurate and robust than the six existing methods considered in the study. CONCLUSIONS: Focusing on prediction accuracy in conjunction with resampling produces accurate and robust estimates of the number of clusters.
format Text
id pubmed-126241
institution National Center for Biotechnology Information
language English
publishDate 2002
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-1262412002-09-25 A prediction-based resampling method for estimating the number of clusters in a dataset Dudoit, Sandrine Fridlyand, Jane Genome Biol Research BACKGROUND: Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems, such as the classification of tumors. An important statistical problem associated with tumor classification is the identification of new tumor classes using gene-expression profiles. Two essential aspects of this clustering problem are: to estimate the number of clusters, if any, in a dataset; and to allocate tumor samples to these clusters, and assess the confidence of cluster assignments for individual samples. Here we address the first of these problems. RESULTS: We have developed a new prediction-based resampling method, Clest, to estimate the number of clusters in a dataset. The performance of the new and existing methods were compared using simulated data and gene-expression data from four recently published cancer microarray studies. Clest was generally found to be more accurate and robust than the six existing methods considered in the study. CONCLUSIONS: Focusing on prediction accuracy in conjunction with resampling produces accurate and robust estimates of the number of clusters. BioMed Central 2002 2002-06-25 /pmc/articles/PMC126241/ /pubmed/12184810 Text en Copyright ©2002 Dudoit and Fridlyand, licensee BioMed Central Ltd
spellingShingle Research
Dudoit, Sandrine
Fridlyand, Jane
A prediction-based resampling method for estimating the number of clusters in a dataset
title A prediction-based resampling method for estimating the number of clusters in a dataset
title_full A prediction-based resampling method for estimating the number of clusters in a dataset
title_fullStr A prediction-based resampling method for estimating the number of clusters in a dataset
title_full_unstemmed A prediction-based resampling method for estimating the number of clusters in a dataset
title_short A prediction-based resampling method for estimating the number of clusters in a dataset
title_sort prediction-based resampling method for estimating the number of clusters in a dataset
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC126241/
https://www.ncbi.nlm.nih.gov/pubmed/12184810
work_keys_str_mv AT dudoitsandrine apredictionbasedresamplingmethodforestimatingthenumberofclustersinadataset
AT fridlyandjane apredictionbasedresamplingmethodforestimatingthenumberofclustersinadataset
AT dudoitsandrine predictionbasedresamplingmethodforestimatingthenumberofclustersinadataset
AT fridlyandjane predictionbasedresamplingmethodforestimatingthenumberofclustersinadataset