Cargando…

Merged consensus clustering to assess and improve class discovery with microarray data

BACKGROUND: One of the most commonly performed tasks when analysing high throughput gene expression data is to use clustering methods to classify the data into groups. There are a large number of methods available to perform clustering, but it is often unclear which method is best suited to the data...

Descripción completa

Detalles Bibliográficos
Autores principales: Simpson, T Ian, Armstrong, J Douglas, Jarman, Andrew P
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3002369/
https://www.ncbi.nlm.nih.gov/pubmed/21129181
http://dx.doi.org/10.1186/1471-2105-11-590
_version_ 1782193742544896000
author Simpson, T Ian
Armstrong, J Douglas
Jarman, Andrew P
author_facet Simpson, T Ian
Armstrong, J Douglas
Jarman, Andrew P
author_sort Simpson, T Ian
collection PubMed
description BACKGROUND: One of the most commonly performed tasks when analysing high throughput gene expression data is to use clustering methods to classify the data into groups. There are a large number of methods available to perform clustering, but it is often unclear which method is best suited to the data and how to quantify the quality of the classifications produced. RESULTS: Here we describe an R package containing methods to analyse the consistency of clustering results from any number of different clustering methods using resampling statistics. These methods allow the identification of the the best supported clusters and additionally rank cluster members by their fidelity within the cluster. These metrics allow us to compare the performance of different clustering algorithms under different experimental conditions and to select those that produce the most reliable clustering structures. We show the application of this method to simulated data, canonical gene expression experiments and our own novel analysis of genes involved in the specification of the peripheral nervous system in the fruitfly, Drosophila melanogaster. CONCLUSIONS: Our package enables users to apply the merged consensus clustering methodology conveniently within the R programming environment, providing both analysis and graphical display functions for exploring clustering approaches. It extends the basic principle of consensus clustering by allowing the merging of results between different methods to provide an averaged clustering robustness. We show that this extension is useful in correcting for the tendency of clustering algorithms to treat outliers differently within datasets. The R package, clusterCons, is freely available at CRAN and sourceforge under the GNU public licence.
format Text
id pubmed-3002369
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30023692011-01-06 Merged consensus clustering to assess and improve class discovery with microarray data Simpson, T Ian Armstrong, J Douglas Jarman, Andrew P BMC Bioinformatics Software BACKGROUND: One of the most commonly performed tasks when analysing high throughput gene expression data is to use clustering methods to classify the data into groups. There are a large number of methods available to perform clustering, but it is often unclear which method is best suited to the data and how to quantify the quality of the classifications produced. RESULTS: Here we describe an R package containing methods to analyse the consistency of clustering results from any number of different clustering methods using resampling statistics. These methods allow the identification of the the best supported clusters and additionally rank cluster members by their fidelity within the cluster. These metrics allow us to compare the performance of different clustering algorithms under different experimental conditions and to select those that produce the most reliable clustering structures. We show the application of this method to simulated data, canonical gene expression experiments and our own novel analysis of genes involved in the specification of the peripheral nervous system in the fruitfly, Drosophila melanogaster. CONCLUSIONS: Our package enables users to apply the merged consensus clustering methodology conveniently within the R programming environment, providing both analysis and graphical display functions for exploring clustering approaches. It extends the basic principle of consensus clustering by allowing the merging of results between different methods to provide an averaged clustering robustness. We show that this extension is useful in correcting for the tendency of clustering algorithms to treat outliers differently within datasets. The R package, clusterCons, is freely available at CRAN and sourceforge under the GNU public licence. BioMed Central 2010-12-03 /pmc/articles/PMC3002369/ /pubmed/21129181 http://dx.doi.org/10.1186/1471-2105-11-590 Text en Copyright ©2010 Simpson et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Simpson, T Ian
Armstrong, J Douglas
Jarman, Andrew P
Merged consensus clustering to assess and improve class discovery with microarray data
title Merged consensus clustering to assess and improve class discovery with microarray data
title_full Merged consensus clustering to assess and improve class discovery with microarray data
title_fullStr Merged consensus clustering to assess and improve class discovery with microarray data
title_full_unstemmed Merged consensus clustering to assess and improve class discovery with microarray data
title_short Merged consensus clustering to assess and improve class discovery with microarray data
title_sort merged consensus clustering to assess and improve class discovery with microarray data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3002369/
https://www.ncbi.nlm.nih.gov/pubmed/21129181
http://dx.doi.org/10.1186/1471-2105-11-590
work_keys_str_mv AT simpsontian mergedconsensusclusteringtoassessandimproveclassdiscoverywithmicroarraydata
AT armstrongjdouglas mergedconsensusclusteringtoassessandimproveclassdiscoverywithmicroarraydata
AT jarmanandrewp mergedconsensusclusteringtoassessandimproveclassdiscoverywithmicroarraydata