Cargando…

Shrinkage Clustering: a fast and size-constrained clustering algorithm for biomedical applications

BACKGROUND: Many common clustering algorithms require a two-step process that limits their efficiency. The algorithms need to be performed repetitively and need to be implemented together with a model selection criterion. These two steps are needed in order to determine both the number of clusters p...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Chenyue W., Li, Hanyang, Qutub, Amina A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5782397/
https://www.ncbi.nlm.nih.gov/pubmed/29361928
http://dx.doi.org/10.1186/s12859-018-2022-8
_version_ 1783295184638836736
author Hu, Chenyue W.
Li, Hanyang
Qutub, Amina A.
author_facet Hu, Chenyue W.
Li, Hanyang
Qutub, Amina A.
author_sort Hu, Chenyue W.
collection PubMed
description BACKGROUND: Many common clustering algorithms require a two-step process that limits their efficiency. The algorithms need to be performed repetitively and need to be implemented together with a model selection criterion. These two steps are needed in order to determine both the number of clusters present in the data and the corresponding cluster memberships. As biomedical datasets increase in size and prevalence, there is a growing need for new methods that are more convenient to implement and are more computationally efficient. In addition, it is often essential to obtain clusters of sufficient sample size to make the clustering result meaningful and interpretable for subsequent analysis. RESULTS: We introduce Shrinkage Clustering, a novel clustering algorithm based on matrix factorization that simultaneously finds the optimal number of clusters while partitioning the data. We report its performances across multiple simulated and actual datasets, and demonstrate its strength in accuracy and speed applied to subtyping cancer and brain tissues. In addition, the algorithm offers a straightforward solution to clustering with cluster size constraints. CONCLUSIONS: Given its ease of implementation, computing efficiency and extensible structure, Shrinkage Clustering can be applied broadly to solve biomedical clustering tasks especially when dealing with large datasets.
format Online
Article
Text
id pubmed-5782397
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57823972018-02-06 Shrinkage Clustering: a fast and size-constrained clustering algorithm for biomedical applications Hu, Chenyue W. Li, Hanyang Qutub, Amina A. BMC Bioinformatics Methodology Article BACKGROUND: Many common clustering algorithms require a two-step process that limits their efficiency. The algorithms need to be performed repetitively and need to be implemented together with a model selection criterion. These two steps are needed in order to determine both the number of clusters present in the data and the corresponding cluster memberships. As biomedical datasets increase in size and prevalence, there is a growing need for new methods that are more convenient to implement and are more computationally efficient. In addition, it is often essential to obtain clusters of sufficient sample size to make the clustering result meaningful and interpretable for subsequent analysis. RESULTS: We introduce Shrinkage Clustering, a novel clustering algorithm based on matrix factorization that simultaneously finds the optimal number of clusters while partitioning the data. We report its performances across multiple simulated and actual datasets, and demonstrate its strength in accuracy and speed applied to subtyping cancer and brain tissues. In addition, the algorithm offers a straightforward solution to clustering with cluster size constraints. CONCLUSIONS: Given its ease of implementation, computing efficiency and extensible structure, Shrinkage Clustering can be applied broadly to solve biomedical clustering tasks especially when dealing with large datasets. BioMed Central 2018-01-23 /pmc/articles/PMC5782397/ /pubmed/29361928 http://dx.doi.org/10.1186/s12859-018-2022-8 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Hu, Chenyue W.
Li, Hanyang
Qutub, Amina A.
Shrinkage Clustering: a fast and size-constrained clustering algorithm for biomedical applications
title Shrinkage Clustering: a fast and size-constrained clustering algorithm for biomedical applications
title_full Shrinkage Clustering: a fast and size-constrained clustering algorithm for biomedical applications
title_fullStr Shrinkage Clustering: a fast and size-constrained clustering algorithm for biomedical applications
title_full_unstemmed Shrinkage Clustering: a fast and size-constrained clustering algorithm for biomedical applications
title_short Shrinkage Clustering: a fast and size-constrained clustering algorithm for biomedical applications
title_sort shrinkage clustering: a fast and size-constrained clustering algorithm for biomedical applications
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5782397/
https://www.ncbi.nlm.nih.gov/pubmed/29361928
http://dx.doi.org/10.1186/s12859-018-2022-8
work_keys_str_mv AT huchenyuew shrinkageclusteringafastandsizeconstrainedclusteringalgorithmforbiomedicalapplications
AT lihanyang shrinkageclusteringafastandsizeconstrainedclusteringalgorithmforbiomedicalapplications
AT qutubaminaa shrinkageclusteringafastandsizeconstrainedclusteringalgorithmforbiomedicalapplications