Cargando…

Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics

The development of microarray technology has enabled scientists to measure the expression of thousands of genes simultaneously, resulting in a surge of interest in several disciplines throughout biology and medicine. While data clustering has been used for decades in image processing and pattern rec...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dalton, Lori, Ballarin, Virginia, Brun, Marcel
Formato:	Texto
Lenguaje:	English
Publicado:	Bentham Science Publishers Ltd. 2009
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2766793/ https://www.ncbi.nlm.nih.gov/pubmed/20190957 http://dx.doi.org/10.2174/138920209789177601

_version_	1782173245463592960
author	Dalton, Lori Ballarin, Virginia Brun, Marcel
author_facet	Dalton, Lori Ballarin, Virginia Brun, Marcel
author_sort	Dalton, Lori
collection	PubMed
description	The development of microarray technology has enabled scientists to measure the expression of thousands of genes simultaneously, resulting in a surge of interest in several disciplines throughout biology and medicine. While data clustering has been used for decades in image processing and pattern recognition, in recent years it has joined this wave of activity as a popular technique to analyze microarrays. To illustrate its application to genomics, clustering applied to genes from a set of microarray data groups together those genes whose expression levels exhibit similar behavior throughout the samples, and when applied to samples it offers the potential to discriminate pathologies based on their differential patterns of gene expression. Although clustering has now been used for many years in the context of gene expression microarrays, it has remained highly problematic. The choice of a clustering algorithm and validation index is not a trivial one, more so when applying them to high throughput biological or medical data. Factors to consider when choosing an algorithm include the nature of the application, the characteristics of the objects to be analyzed, the expected number and shape of the clusters, and the complexity of the problem versus computational power available. In some cases a very simple algorithm may be appropriate to tackle a problem, but many situations may require a more complex and powerful algorithm better suited for the job at hand. In this paper, we will cover the theoretical aspects of clustering, including error and learning, followed by an overview of popular clustering algorithms and classical validation indices. We also discuss the relative performance of these algorithms and indices and conclude with examples of the application of clustering to computational biology.
format	Text
id	pubmed-2766793
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	Bentham Science Publishers Ltd.
record_format	MEDLINE/PubMed
spelling	pubmed-27667932010-03-01 Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics Dalton, Lori Ballarin, Virginia Brun, Marcel Curr Genomics Article The development of microarray technology has enabled scientists to measure the expression of thousands of genes simultaneously, resulting in a surge of interest in several disciplines throughout biology and medicine. While data clustering has been used for decades in image processing and pattern recognition, in recent years it has joined this wave of activity as a popular technique to analyze microarrays. To illustrate its application to genomics, clustering applied to genes from a set of microarray data groups together those genes whose expression levels exhibit similar behavior throughout the samples, and when applied to samples it offers the potential to discriminate pathologies based on their differential patterns of gene expression. Although clustering has now been used for many years in the context of gene expression microarrays, it has remained highly problematic. The choice of a clustering algorithm and validation index is not a trivial one, more so when applying them to high throughput biological or medical data. Factors to consider when choosing an algorithm include the nature of the application, the characteristics of the objects to be analyzed, the expected number and shape of the clusters, and the complexity of the problem versus computational power available. In some cases a very simple algorithm may be appropriate to tackle a problem, but many situations may require a more complex and powerful algorithm better suited for the job at hand. In this paper, we will cover the theoretical aspects of clustering, including error and learning, followed by an overview of popular clustering algorithms and classical validation indices. We also discuss the relative performance of these algorithms and indices and conclude with examples of the application of clustering to computational biology. Bentham Science Publishers Ltd. 2009-09 /pmc/articles/PMC2766793/ /pubmed/20190957 http://dx.doi.org/10.2174/138920209789177601 Text en ©2009 Bentham Science Publishers Ltd. http://creativecommons.org/licenses/by/2.5/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.5/), which permits unrestrictive use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Article Dalton, Lori Ballarin, Virginia Brun, Marcel Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics
title	Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics
title_full	Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics
title_fullStr	Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics
title_full_unstemmed	Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics
title_short	Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics
title_sort	clustering algorithms: on learning, validation, performance, and applications to genomics
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2766793/ https://www.ncbi.nlm.nih.gov/pubmed/20190957 http://dx.doi.org/10.2174/138920209789177601
work_keys_str_mv	AT daltonlori clusteringalgorithmsonlearningvalidationperformanceandapplicationstogenomics AT ballarinvirginia clusteringalgorithmsonlearningvalidationperformanceandapplicationstogenomics AT brunmarcel clusteringalgorithmsonlearningvalidationperformanceandapplicationstogenomics

Clustering Algorithms: On Learning, Validation, Performance, and Applications to Genomics

Ejemplares similares