Cargando…

Progeny Clustering: A Method to Identify Biological Phenotypes

Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficie...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Chenyue W., Kornblau, Steven M., Slater, John H., Qutub, Amina A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4533525/
https://www.ncbi.nlm.nih.gov/pubmed/26267476
http://dx.doi.org/10.1038/srep12894
_version_ 1782385350715375616
author Hu, Chenyue W.
Kornblau, Steven M.
Slater, John H.
Qutub, Amina A.
author_facet Hu, Chenyue W.
Kornblau, Steven M.
Slater, John H.
Qutub, Amina A.
author_sort Hu, Chenyue W.
collection PubMed
description Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset.
format Online
Article
Text
id pubmed-4533525
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-45335252015-08-13 Progeny Clustering: A Method to Identify Biological Phenotypes Hu, Chenyue W. Kornblau, Steven M. Slater, John H. Qutub, Amina A. Sci Rep Article Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset. Nature Publishing Group 2015-08-12 /pmc/articles/PMC4533525/ /pubmed/26267476 http://dx.doi.org/10.1038/srep12894 Text en Copyright © 2015, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Hu, Chenyue W.
Kornblau, Steven M.
Slater, John H.
Qutub, Amina A.
Progeny Clustering: A Method to Identify Biological Phenotypes
title Progeny Clustering: A Method to Identify Biological Phenotypes
title_full Progeny Clustering: A Method to Identify Biological Phenotypes
title_fullStr Progeny Clustering: A Method to Identify Biological Phenotypes
title_full_unstemmed Progeny Clustering: A Method to Identify Biological Phenotypes
title_short Progeny Clustering: A Method to Identify Biological Phenotypes
title_sort progeny clustering: a method to identify biological phenotypes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4533525/
https://www.ncbi.nlm.nih.gov/pubmed/26267476
http://dx.doi.org/10.1038/srep12894
work_keys_str_mv AT huchenyuew progenyclusteringamethodtoidentifybiologicalphenotypes
AT kornblaustevenm progenyclusteringamethodtoidentifybiologicalphenotypes
AT slaterjohnh progenyclusteringamethodtoidentifybiologicalphenotypes
AT qutubaminaa progenyclusteringamethodtoidentifybiologicalphenotypes