Cargando…

A systematic comparison of genome-scale clustering algorithms

BACKGROUND: A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expre...

Descripción completa

Detalles Bibliográficos
Autores principales: Jay, Jeremy J, Eblen, John D, Zhang, Yun, Benson, Mikael, Perkins, Andy D, Saxton, Arnold M, Voy, Brynn H, Chesler, Elissa J, Langston, Michael A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3382433/
https://www.ncbi.nlm.nih.gov/pubmed/22759431
http://dx.doi.org/10.1186/1471-2105-13-S10-S7
_version_ 1782236498846810112
author Jay, Jeremy J
Eblen, John D
Zhang, Yun
Benson, Mikael
Perkins, Andy D
Saxton, Arnold M
Voy, Brynn H
Chesler, Elissa J
Langston, Michael A
author_facet Jay, Jeremy J
Eblen, John D
Zhang, Yun
Benson, Mikael
Perkins, Andy D
Saxton, Arnold M
Voy, Brynn H
Chesler, Elissa J
Langston, Michael A
author_sort Jay, Jeremy J
collection PubMed
description BACKGROUND: A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae. METHODS: For each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each cluster's agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method. RESULTS: Clusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods. CONCLUSIONS: Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further development and application of combinatorial strategies is warranted.
format Online
Article
Text
id pubmed-3382433
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33824332012-06-28 A systematic comparison of genome-scale clustering algorithms Jay, Jeremy J Eblen, John D Zhang, Yun Benson, Mikael Perkins, Andy D Saxton, Arnold M Voy, Brynn H Chesler, Elissa J Langston, Michael A BMC Bioinformatics Proceedings BACKGROUND: A wealth of clustering algorithms has been applied to gene co-expression experiments. These algorithms cover a broad range of approaches, from conventional techniques such as k-means and hierarchical clustering, to graphical approaches such as k-clique communities, weighted gene co-expression networks (WGCNA) and paraclique. Comparison of these methods to evaluate their relative effectiveness provides guidance to algorithm selection, development and implementation. Most prior work on comparative clustering evaluation has focused on parametric methods. Graph theoretical methods are recent additions to the tool set for the global analysis and decomposition of microarray co-expression matrices that have not generally been included in earlier methodological comparisons. In the present study, a variety of parametric and graph theoretical clustering algorithms are compared using well-characterized transcriptomic data at a genome scale from Saccharomyces cerevisiae. METHODS: For each clustering method under study, a variety of parameters were tested. Jaccard similarity was used to measure each cluster's agreement with every GO and KEGG annotation set, and the highest Jaccard score was assigned to the cluster. Clusters were grouped into small, medium, and large bins, and the Jaccard score of the top five scoring clusters in each bin were averaged and reported as the best average top 5 (BAT5) score for the particular method. RESULTS: Clusters produced by each method were evaluated based upon the positive match to known pathways. This produces a readily interpretable ranking of the relative effectiveness of clustering on the genes. Methods were also tested to determine whether they were able to identify clusters consistent with those identified by other clustering methods. CONCLUSIONS: Validation of clusters against known gene classifications demonstrate that for this data, graph-based techniques outperform conventional clustering approaches, suggesting that further development and application of combinatorial strategies is warranted. BioMed Central 2012-06-25 /pmc/articles/PMC3382433/ /pubmed/22759431 http://dx.doi.org/10.1186/1471-2105-13-S10-S7 Text en Copyright ©2012 Jay et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Jay, Jeremy J
Eblen, John D
Zhang, Yun
Benson, Mikael
Perkins, Andy D
Saxton, Arnold M
Voy, Brynn H
Chesler, Elissa J
Langston, Michael A
A systematic comparison of genome-scale clustering algorithms
title A systematic comparison of genome-scale clustering algorithms
title_full A systematic comparison of genome-scale clustering algorithms
title_fullStr A systematic comparison of genome-scale clustering algorithms
title_full_unstemmed A systematic comparison of genome-scale clustering algorithms
title_short A systematic comparison of genome-scale clustering algorithms
title_sort systematic comparison of genome-scale clustering algorithms
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3382433/
https://www.ncbi.nlm.nih.gov/pubmed/22759431
http://dx.doi.org/10.1186/1471-2105-13-S10-S7
work_keys_str_mv AT jayjeremyj asystematiccomparisonofgenomescaleclusteringalgorithms
AT eblenjohnd asystematiccomparisonofgenomescaleclusteringalgorithms
AT zhangyun asystematiccomparisonofgenomescaleclusteringalgorithms
AT bensonmikael asystematiccomparisonofgenomescaleclusteringalgorithms
AT perkinsandyd asystematiccomparisonofgenomescaleclusteringalgorithms
AT saxtonarnoldm asystematiccomparisonofgenomescaleclusteringalgorithms
AT voybrynnh asystematiccomparisonofgenomescaleclusteringalgorithms
AT cheslerelissaj asystematiccomparisonofgenomescaleclusteringalgorithms
AT langstonmichaela asystematiccomparisonofgenomescaleclusteringalgorithms
AT jayjeremyj systematiccomparisonofgenomescaleclusteringalgorithms
AT eblenjohnd systematiccomparisonofgenomescaleclusteringalgorithms
AT zhangyun systematiccomparisonofgenomescaleclusteringalgorithms
AT bensonmikael systematiccomparisonofgenomescaleclusteringalgorithms
AT perkinsandyd systematiccomparisonofgenomescaleclusteringalgorithms
AT saxtonarnoldm systematiccomparisonofgenomescaleclusteringalgorithms
AT voybrynnh systematiccomparisonofgenomescaleclusteringalgorithms
AT cheslerelissaj systematiccomparisonofgenomescaleclusteringalgorithms
AT langstonmichaela systematiccomparisonofgenomescaleclusteringalgorithms