Cargando…
A prediction-based resampling method for estimating the number of clusters in a dataset
BACKGROUND: Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems, such as the classification of tumors. An important statistical problem associated with tumor classification is the identification of new tumor classes using gene-ex...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2002
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC126241/ https://www.ncbi.nlm.nih.gov/pubmed/12184810 |
_version_ | 1782120325105844224 |
---|---|
author | Dudoit, Sandrine Fridlyand, Jane |
author_facet | Dudoit, Sandrine Fridlyand, Jane |
author_sort | Dudoit, Sandrine |
collection | PubMed |
description | BACKGROUND: Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems, such as the classification of tumors. An important statistical problem associated with tumor classification is the identification of new tumor classes using gene-expression profiles. Two essential aspects of this clustering problem are: to estimate the number of clusters, if any, in a dataset; and to allocate tumor samples to these clusters, and assess the confidence of cluster assignments for individual samples. Here we address the first of these problems. RESULTS: We have developed a new prediction-based resampling method, Clest, to estimate the number of clusters in a dataset. The performance of the new and existing methods were compared using simulated data and gene-expression data from four recently published cancer microarray studies. Clest was generally found to be more accurate and robust than the six existing methods considered in the study. CONCLUSIONS: Focusing on prediction accuracy in conjunction with resampling produces accurate and robust estimates of the number of clusters. |
format | Text |
id | pubmed-126241 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2002 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-1262412002-09-25 A prediction-based resampling method for estimating the number of clusters in a dataset Dudoit, Sandrine Fridlyand, Jane Genome Biol Research BACKGROUND: Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems, such as the classification of tumors. An important statistical problem associated with tumor classification is the identification of new tumor classes using gene-expression profiles. Two essential aspects of this clustering problem are: to estimate the number of clusters, if any, in a dataset; and to allocate tumor samples to these clusters, and assess the confidence of cluster assignments for individual samples. Here we address the first of these problems. RESULTS: We have developed a new prediction-based resampling method, Clest, to estimate the number of clusters in a dataset. The performance of the new and existing methods were compared using simulated data and gene-expression data from four recently published cancer microarray studies. Clest was generally found to be more accurate and robust than the six existing methods considered in the study. CONCLUSIONS: Focusing on prediction accuracy in conjunction with resampling produces accurate and robust estimates of the number of clusters. BioMed Central 2002 2002-06-25 /pmc/articles/PMC126241/ /pubmed/12184810 Text en Copyright ©2002 Dudoit and Fridlyand, licensee BioMed Central Ltd |
spellingShingle | Research Dudoit, Sandrine Fridlyand, Jane A prediction-based resampling method for estimating the number of clusters in a dataset |
title | A prediction-based resampling method for estimating the number of clusters in a dataset |
title_full | A prediction-based resampling method for estimating the number of clusters in a dataset |
title_fullStr | A prediction-based resampling method for estimating the number of clusters in a dataset |
title_full_unstemmed | A prediction-based resampling method for estimating the number of clusters in a dataset |
title_short | A prediction-based resampling method for estimating the number of clusters in a dataset |
title_sort | prediction-based resampling method for estimating the number of clusters in a dataset |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC126241/ https://www.ncbi.nlm.nih.gov/pubmed/12184810 |
work_keys_str_mv | AT dudoitsandrine apredictionbasedresamplingmethodforestimatingthenumberofclustersinadataset AT fridlyandjane apredictionbasedresamplingmethodforestimatingthenumberofclustersinadataset AT dudoitsandrine predictionbasedresamplingmethodforestimatingthenumberofclustersinadataset AT fridlyandjane predictionbasedresamplingmethodforestimatingthenumberofclustersinadataset |