Cargando…
Merged consensus clustering to assess and improve class discovery with microarray data
BACKGROUND: One of the most commonly performed tasks when analysing high throughput gene expression data is to use clustering methods to classify the data into groups. There are a large number of methods available to perform clustering, but it is often unclear which method is best suited to the data...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3002369/ https://www.ncbi.nlm.nih.gov/pubmed/21129181 http://dx.doi.org/10.1186/1471-2105-11-590 |
_version_ | 1782193742544896000 |
---|---|
author | Simpson, T Ian Armstrong, J Douglas Jarman, Andrew P |
author_facet | Simpson, T Ian Armstrong, J Douglas Jarman, Andrew P |
author_sort | Simpson, T Ian |
collection | PubMed |
description | BACKGROUND: One of the most commonly performed tasks when analysing high throughput gene expression data is to use clustering methods to classify the data into groups. There are a large number of methods available to perform clustering, but it is often unclear which method is best suited to the data and how to quantify the quality of the classifications produced. RESULTS: Here we describe an R package containing methods to analyse the consistency of clustering results from any number of different clustering methods using resampling statistics. These methods allow the identification of the the best supported clusters and additionally rank cluster members by their fidelity within the cluster. These metrics allow us to compare the performance of different clustering algorithms under different experimental conditions and to select those that produce the most reliable clustering structures. We show the application of this method to simulated data, canonical gene expression experiments and our own novel analysis of genes involved in the specification of the peripheral nervous system in the fruitfly, Drosophila melanogaster. CONCLUSIONS: Our package enables users to apply the merged consensus clustering methodology conveniently within the R programming environment, providing both analysis and graphical display functions for exploring clustering approaches. It extends the basic principle of consensus clustering by allowing the merging of results between different methods to provide an averaged clustering robustness. We show that this extension is useful in correcting for the tendency of clustering algorithms to treat outliers differently within datasets. The R package, clusterCons, is freely available at CRAN and sourceforge under the GNU public licence. |
format | Text |
id | pubmed-3002369 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-30023692011-01-06 Merged consensus clustering to assess and improve class discovery with microarray data Simpson, T Ian Armstrong, J Douglas Jarman, Andrew P BMC Bioinformatics Software BACKGROUND: One of the most commonly performed tasks when analysing high throughput gene expression data is to use clustering methods to classify the data into groups. There are a large number of methods available to perform clustering, but it is often unclear which method is best suited to the data and how to quantify the quality of the classifications produced. RESULTS: Here we describe an R package containing methods to analyse the consistency of clustering results from any number of different clustering methods using resampling statistics. These methods allow the identification of the the best supported clusters and additionally rank cluster members by their fidelity within the cluster. These metrics allow us to compare the performance of different clustering algorithms under different experimental conditions and to select those that produce the most reliable clustering structures. We show the application of this method to simulated data, canonical gene expression experiments and our own novel analysis of genes involved in the specification of the peripheral nervous system in the fruitfly, Drosophila melanogaster. CONCLUSIONS: Our package enables users to apply the merged consensus clustering methodology conveniently within the R programming environment, providing both analysis and graphical display functions for exploring clustering approaches. It extends the basic principle of consensus clustering by allowing the merging of results between different methods to provide an averaged clustering robustness. We show that this extension is useful in correcting for the tendency of clustering algorithms to treat outliers differently within datasets. The R package, clusterCons, is freely available at CRAN and sourceforge under the GNU public licence. BioMed Central 2010-12-03 /pmc/articles/PMC3002369/ /pubmed/21129181 http://dx.doi.org/10.1186/1471-2105-11-590 Text en Copyright ©2010 Simpson et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Simpson, T Ian Armstrong, J Douglas Jarman, Andrew P Merged consensus clustering to assess and improve class discovery with microarray data |
title | Merged consensus clustering to assess and improve class discovery with microarray data |
title_full | Merged consensus clustering to assess and improve class discovery with microarray data |
title_fullStr | Merged consensus clustering to assess and improve class discovery with microarray data |
title_full_unstemmed | Merged consensus clustering to assess and improve class discovery with microarray data |
title_short | Merged consensus clustering to assess and improve class discovery with microarray data |
title_sort | merged consensus clustering to assess and improve class discovery with microarray data |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3002369/ https://www.ncbi.nlm.nih.gov/pubmed/21129181 http://dx.doi.org/10.1186/1471-2105-11-590 |
work_keys_str_mv | AT simpsontian mergedconsensusclusteringtoassessandimproveclassdiscoverywithmicroarraydata AT armstrongjdouglas mergedconsensusclusteringtoassessandimproveclassdiscoverywithmicroarraydata AT jarmanandrewp mergedconsensusclusteringtoassessandimproveclassdiscoverywithmicroarraydata |