Cargando…
A comparison of four clustering methods for brain expression microarray data
BACKGROUND: DNA microarrays, which determine the expression levels of tens of thousands of genes from a sample, are an important research tool. However, the volume of data they produce can be an obstacle to interpretation of the results. Clustering the genes on the basis of similarity of their expre...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2655095/ https://www.ncbi.nlm.nih.gov/pubmed/19032745 http://dx.doi.org/10.1186/1471-2105-9-490 |
_version_ | 1782165437508747264 |
---|---|
author | Richards, Alexander L Holmans, Peter O'Donovan, Michael C Owen, Michael J Jones, Lesley |
author_facet | Richards, Alexander L Holmans, Peter O'Donovan, Michael C Owen, Michael J Jones, Lesley |
author_sort | Richards, Alexander L |
collection | PubMed |
description | BACKGROUND: DNA microarrays, which determine the expression levels of tens of thousands of genes from a sample, are an important research tool. However, the volume of data they produce can be an obstacle to interpretation of the results. Clustering the genes on the basis of similarity of their expression profiles can simplify the data, and potentially provides an important source of biological inference, but these methods have not been tested systematically on datasets from complex human tissues. In this paper, four clustering methods, CRC, k-means, ISA and memISA, are used upon three brain expression datasets. The results are compared on speed, gene coverage and GO enrichment. The effects of combining the clusters produced by each method are also assessed. RESULTS: k-means outperforms the other methods, with 100% gene coverage and GO enrichments only slightly exceeded by memISA and ISA. Those two methods produce greater GO enrichments on the datasets used, but at the cost of much lower gene coverage, fewer clusters produced, and speed. The clusters they find are largely different to those produced by k-means. Combining clusters produced by k-means and memISA or ISA leads to increased GO enrichment and number of clusters produced (compared to k-means alone), without negatively impacting gene coverage. memISA can also find potentially disease-related clusters. In two independent dorsolateral prefrontal cortex datasets, it finds three overlapping clusters that are either enriched for genes associated with schizophrenia, genes differentially expressed in schizophrenia, or both. Two of these clusters are enriched for genes of the MAP kinase pathway, suggesting a possible role for this pathway in the aetiology of schizophrenia. CONCLUSION: Considered alone, k-means clustering is the most effective of the four methods on typical microarray brain expression datasets. However, memISA and ISA can add extra high-quality clusters to the set produced by k-means, so combining these three methods is the method of choice. |
format | Text |
id | pubmed-2655095 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-26550952009-03-14 A comparison of four clustering methods for brain expression microarray data Richards, Alexander L Holmans, Peter O'Donovan, Michael C Owen, Michael J Jones, Lesley BMC Bioinformatics Research Article BACKGROUND: DNA microarrays, which determine the expression levels of tens of thousands of genes from a sample, are an important research tool. However, the volume of data they produce can be an obstacle to interpretation of the results. Clustering the genes on the basis of similarity of their expression profiles can simplify the data, and potentially provides an important source of biological inference, but these methods have not been tested systematically on datasets from complex human tissues. In this paper, four clustering methods, CRC, k-means, ISA and memISA, are used upon three brain expression datasets. The results are compared on speed, gene coverage and GO enrichment. The effects of combining the clusters produced by each method are also assessed. RESULTS: k-means outperforms the other methods, with 100% gene coverage and GO enrichments only slightly exceeded by memISA and ISA. Those two methods produce greater GO enrichments on the datasets used, but at the cost of much lower gene coverage, fewer clusters produced, and speed. The clusters they find are largely different to those produced by k-means. Combining clusters produced by k-means and memISA or ISA leads to increased GO enrichment and number of clusters produced (compared to k-means alone), without negatively impacting gene coverage. memISA can also find potentially disease-related clusters. In two independent dorsolateral prefrontal cortex datasets, it finds three overlapping clusters that are either enriched for genes associated with schizophrenia, genes differentially expressed in schizophrenia, or both. Two of these clusters are enriched for genes of the MAP kinase pathway, suggesting a possible role for this pathway in the aetiology of schizophrenia. CONCLUSION: Considered alone, k-means clustering is the most effective of the four methods on typical microarray brain expression datasets. However, memISA and ISA can add extra high-quality clusters to the set produced by k-means, so combining these three methods is the method of choice. BioMed Central 2008-11-25 /pmc/articles/PMC2655095/ /pubmed/19032745 http://dx.doi.org/10.1186/1471-2105-9-490 Text en Copyright © 2008 Richards et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Richards, Alexander L Holmans, Peter O'Donovan, Michael C Owen, Michael J Jones, Lesley A comparison of four clustering methods for brain expression microarray data |
title | A comparison of four clustering methods for brain expression microarray data |
title_full | A comparison of four clustering methods for brain expression microarray data |
title_fullStr | A comparison of four clustering methods for brain expression microarray data |
title_full_unstemmed | A comparison of four clustering methods for brain expression microarray data |
title_short | A comparison of four clustering methods for brain expression microarray data |
title_sort | comparison of four clustering methods for brain expression microarray data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2655095/ https://www.ncbi.nlm.nih.gov/pubmed/19032745 http://dx.doi.org/10.1186/1471-2105-9-490 |
work_keys_str_mv | AT richardsalexanderl acomparisonoffourclusteringmethodsforbrainexpressionmicroarraydata AT holmanspeter acomparisonoffourclusteringmethodsforbrainexpressionmicroarraydata AT odonovanmichaelc acomparisonoffourclusteringmethodsforbrainexpressionmicroarraydata AT owenmichaelj acomparisonoffourclusteringmethodsforbrainexpressionmicroarraydata AT joneslesley acomparisonoffourclusteringmethodsforbrainexpressionmicroarraydata AT richardsalexanderl comparisonoffourclusteringmethodsforbrainexpressionmicroarraydata AT holmanspeter comparisonoffourclusteringmethodsforbrainexpressionmicroarraydata AT odonovanmichaelc comparisonoffourclusteringmethodsforbrainexpressionmicroarraydata AT owenmichaelj comparisonoffourclusteringmethodsforbrainexpressionmicroarraydata AT joneslesley comparisonoffourclusteringmethodsforbrainexpressionmicroarraydata |