Cargando…

Critical limitations of consensus clustering in class discovery

Consensus clustering (CC) has been adopted for unsupervised class discovery in many genomic studies. It calculates how frequently two samples are grouped together in repeated clustering runs, and uses the resulting pairwise "consensus rates" for visual demonstration that clusters exist, fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Șenbabaoğlu, Yasin, Michailidis, George, Li, Jun Z.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4145288/
https://www.ncbi.nlm.nih.gov/pubmed/25158761
http://dx.doi.org/10.1038/srep06207
_version_ 1782332145755226112
author Șenbabaoğlu, Yasin
Michailidis, George
Li, Jun Z.
author_facet Șenbabaoğlu, Yasin
Michailidis, George
Li, Jun Z.
author_sort Șenbabaoğlu, Yasin
collection PubMed
description Consensus clustering (CC) has been adopted for unsupervised class discovery in many genomic studies. It calculates how frequently two samples are grouped together in repeated clustering runs, and uses the resulting pairwise "consensus rates" for visual demonstration that clusters exist, for comparing cluster stability, and for estimating the optimal cluster number (K). However, the sensitivity and specificity of CC have not been systemically assessed. Through simulations we find that CC is able to divide randomly generated unimodal data into apparently stable clusters for a range of K, essentially reporting chance partitions of cluster-less data. For data with known structure, the common implementations of CC perform poorly in identifying the true K. These results suggest that CC should be applied and interpreted with caution. We found that a new metric based on CC, the proportion of ambiguously clustered pairs (PAC), infers K equally or more reliably than similar methods in simulated data with known K. Our overall approach involves the use of realistic null distributions based on the observed gene-gene correlation structure in a given study, and the implementation of PAC to more accurately estimate K. We discuss the strength of our approach in the context of other ensemble-based methods.
format Online
Article
Text
id pubmed-4145288
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-41452882014-09-02 Critical limitations of consensus clustering in class discovery Șenbabaoğlu, Yasin Michailidis, George Li, Jun Z. Sci Rep Article Consensus clustering (CC) has been adopted for unsupervised class discovery in many genomic studies. It calculates how frequently two samples are grouped together in repeated clustering runs, and uses the resulting pairwise "consensus rates" for visual demonstration that clusters exist, for comparing cluster stability, and for estimating the optimal cluster number (K). However, the sensitivity and specificity of CC have not been systemically assessed. Through simulations we find that CC is able to divide randomly generated unimodal data into apparently stable clusters for a range of K, essentially reporting chance partitions of cluster-less data. For data with known structure, the common implementations of CC perform poorly in identifying the true K. These results suggest that CC should be applied and interpreted with caution. We found that a new metric based on CC, the proportion of ambiguously clustered pairs (PAC), infers K equally or more reliably than similar methods in simulated data with known K. Our overall approach involves the use of realistic null distributions based on the observed gene-gene correlation structure in a given study, and the implementation of PAC to more accurately estimate K. We discuss the strength of our approach in the context of other ensemble-based methods. Nature Publishing Group 2014-08-27 /pmc/articles/PMC4145288/ /pubmed/25158761 http://dx.doi.org/10.1038/srep06207 Text en Copyright © 2014, Macmillan Publishers Limited. All rights reserved http://creativecommons.org/licenses/by-nc-sa/4.0/ This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/
spellingShingle Article
Șenbabaoğlu, Yasin
Michailidis, George
Li, Jun Z.
Critical limitations of consensus clustering in class discovery
title Critical limitations of consensus clustering in class discovery
title_full Critical limitations of consensus clustering in class discovery
title_fullStr Critical limitations of consensus clustering in class discovery
title_full_unstemmed Critical limitations of consensus clustering in class discovery
title_short Critical limitations of consensus clustering in class discovery
title_sort critical limitations of consensus clustering in class discovery
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4145288/
https://www.ncbi.nlm.nih.gov/pubmed/25158761
http://dx.doi.org/10.1038/srep06207
work_keys_str_mv AT senbabaogluyasin criticallimitationsofconsensusclusteringinclassdiscovery
AT michailidisgeorge criticallimitationsofconsensusclusteringinclassdiscovery
AT lijunz criticallimitationsofconsensusclusteringinclassdiscovery