Cargando…

Critical limitations of consensus clustering in class discovery

Consensus clustering (CC) has been adopted for unsupervised class discovery in many genomic studies. It calculates how frequently two samples are grouped together in repeated clustering runs, and uses the resulting pairwise "consensus rates" for visual demonstration that clusters exist, fo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Șenbabaoğlu, Yasin, Michailidis, George, Li, Jun Z.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group 2014
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4145288/ https://www.ncbi.nlm.nih.gov/pubmed/25158761 http://dx.doi.org/10.1038/srep06207

_version_	1782332145755226112
author	Șenbabaoğlu, Yasin Michailidis, George Li, Jun Z.
author_facet	Șenbabaoğlu, Yasin Michailidis, George Li, Jun Z.
author_sort	Șenbabaoğlu, Yasin
collection	PubMed
description	Consensus clustering (CC) has been adopted for unsupervised class discovery in many genomic studies. It calculates how frequently two samples are grouped together in repeated clustering runs, and uses the resulting pairwise "consensus rates" for visual demonstration that clusters exist, for comparing cluster stability, and for estimating the optimal cluster number (K). However, the sensitivity and specificity of CC have not been systemically assessed. Through simulations we find that CC is able to divide randomly generated unimodal data into apparently stable clusters for a range of K, essentially reporting chance partitions of cluster-less data. For data with known structure, the common implementations of CC perform poorly in identifying the true K. These results suggest that CC should be applied and interpreted with caution. We found that a new metric based on CC, the proportion of ambiguously clustered pairs (PAC), infers K equally or more reliably than similar methods in simulated data with known K. Our overall approach involves the use of realistic null distributions based on the observed gene-gene correlation structure in a given study, and the implementation of PAC to more accurately estimate K. We discuss the strength of our approach in the context of other ensemble-based methods.
format	Online Article Text
id	pubmed-4145288
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Nature Publishing Group
record_format	MEDLINE/PubMed
spelling	pubmed-41452882014-09-02 Critical limitations of consensus clustering in class discovery Șenbabaoğlu, Yasin Michailidis, George Li, Jun Z. Sci Rep Article Consensus clustering (CC) has been adopted for unsupervised class discovery in many genomic studies. It calculates how frequently two samples are grouped together in repeated clustering runs, and uses the resulting pairwise "consensus rates" for visual demonstration that clusters exist, for comparing cluster stability, and for estimating the optimal cluster number (K). However, the sensitivity and specificity of CC have not been systemically assessed. Through simulations we find that CC is able to divide randomly generated unimodal data into apparently stable clusters for a range of K, essentially reporting chance partitions of cluster-less data. For data with known structure, the common implementations of CC perform poorly in identifying the true K. These results suggest that CC should be applied and interpreted with caution. We found that a new metric based on CC, the proportion of ambiguously clustered pairs (PAC), infers K equally or more reliably than similar methods in simulated data with known K. Our overall approach involves the use of realistic null distributions based on the observed gene-gene correlation structure in a given study, and the implementation of PAC to more accurately estimate K. We discuss the strength of our approach in the context of other ensemble-based methods. Nature Publishing Group 2014-08-27 /pmc/articles/PMC4145288/ /pubmed/25158761 http://dx.doi.org/10.1038/srep06207 Text en Copyright © 2014, Macmillan Publishers Limited. All rights reserved http://creativecommons.org/licenses/by-nc-sa/4.0/ This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/
spellingShingle	Article Șenbabaoğlu, Yasin Michailidis, George Li, Jun Z. Critical limitations of consensus clustering in class discovery
title	Critical limitations of consensus clustering in class discovery
title_full	Critical limitations of consensus clustering in class discovery
title_fullStr	Critical limitations of consensus clustering in class discovery
title_full_unstemmed	Critical limitations of consensus clustering in class discovery
title_short	Critical limitations of consensus clustering in class discovery
title_sort	critical limitations of consensus clustering in class discovery
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4145288/ https://www.ncbi.nlm.nih.gov/pubmed/25158761 http://dx.doi.org/10.1038/srep06207
work_keys_str_mv	AT senbabaogluyasin criticallimitationsofconsensusclusteringinclassdiscovery AT michailidisgeorge criticallimitationsofconsensusclusteringinclassdiscovery AT lijunz criticallimitationsofconsensusclusteringinclassdiscovery

Critical limitations of consensus clustering in class discovery

Ejemplares similares