Cargando…

Metric for Measuring the Effectiveness of Clustering of DNA Microarray Expression

BACKGROUND: The recent advancement of microarray technology with lower noise and better affordability makes it possible to determine expression of several thousand genes simultaneously. The differentially expressed genes are filtered first and then clustered based on the expression profiles of the g...

Descripción completa

Detalles Bibliográficos
Autores principales:	Loganantharaj, Raja, Cheepala, Satish, Clifford, John
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1683560/ https://www.ncbi.nlm.nih.gov/pubmed/17118148 http://dx.doi.org/10.1186/1471-2105-7-S2-S5

_version_	1782131169459961856
author	Loganantharaj, Raja Cheepala, Satish Clifford, John
author_facet	Loganantharaj, Raja Cheepala, Satish Clifford, John
author_sort	Loganantharaj, Raja
collection	PubMed
description	BACKGROUND: The recent advancement of microarray technology with lower noise and better affordability makes it possible to determine expression of several thousand genes simultaneously. The differentially expressed genes are filtered first and then clustered based on the expression profiles of the genes. A large number of clustering algorithms and distance measuring matrices are proposed in the literature. The popular ones among them include hierarchal clustering and k-means clustering. These algorithms have often used the Euclidian distance or Pearson correlation distance. The biologists or the practitioners are often confused as to which algorithm to use since there is no clear winner among algorithms or among distance measuring metrics. Several validation indices have been proposed in the literature and these are based directly or indirectly on distances; hence a method that uses any of these indices does not relate to any biological features such as biological processes or molecular functions. RESULTS: In this paper we have proposed a metric to measure the effectiveness of clustering algorithms of genes by computing inter-cluster cohesiveness and as well as the intra-cluster separation with respect to biological features such as biological processes or molecular functions. We have applied this metric to the clusters on the data set that we have created as part of a larger study to determine the cancer suppressive mechanism of a class of chemicals called retinoids. We have considered hierarchal and k-means clustering with Euclidian and Pearson correlation distances. Our results show that genes of similar expression profiles are more likely to be closely related to biological processes than they are to molecular functions. The findings have been supported by many works in the area of gene clustering. CONCLUSION: The best clustering algorithm of genes must achieve cohesiveness within a cluster with respect to some biological features, and as well as maximum separation between clusters in terms of the distribution of genes of a behavioral group across clusters. We claim that our proposed metric is novel in this respect and that it provides a measure of both inter and intra cluster cohesiveness. Best of all, computation of the proposed metric is easy and it provides a single quantitative value, which makes comparison of different algorithms easier. The maximum cluster cohesiveness and the maximum intra-cluster separation are indicated by the metric when its value is 0. We have demonstrated the metric by applying it to a data set with gene behavioral groupings such as biological process and molecular functions. The metric can be easily extended to other features of a gene such as DNA binding sites and protein-protein interactions of the gene product, special features of the intron-exon structure, promoter characteristics, etc. The metric can also be used in other domains that use two different parametric spaces; one for clustering and the other one for measuring the effectiveness.
format	Text
id	pubmed-1683560
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-16835602006-12-05 Metric for Measuring the Effectiveness of Clustering of DNA Microarray Expression Loganantharaj, Raja Cheepala, Satish Clifford, John BMC Bioinformatics Proceedings BACKGROUND: The recent advancement of microarray technology with lower noise and better affordability makes it possible to determine expression of several thousand genes simultaneously. The differentially expressed genes are filtered first and then clustered based on the expression profiles of the genes. A large number of clustering algorithms and distance measuring matrices are proposed in the literature. The popular ones among them include hierarchal clustering and k-means clustering. These algorithms have often used the Euclidian distance or Pearson correlation distance. The biologists or the practitioners are often confused as to which algorithm to use since there is no clear winner among algorithms or among distance measuring metrics. Several validation indices have been proposed in the literature and these are based directly or indirectly on distances; hence a method that uses any of these indices does not relate to any biological features such as biological processes or molecular functions. RESULTS: In this paper we have proposed a metric to measure the effectiveness of clustering algorithms of genes by computing inter-cluster cohesiveness and as well as the intra-cluster separation with respect to biological features such as biological processes or molecular functions. We have applied this metric to the clusters on the data set that we have created as part of a larger study to determine the cancer suppressive mechanism of a class of chemicals called retinoids. We have considered hierarchal and k-means clustering with Euclidian and Pearson correlation distances. Our results show that genes of similar expression profiles are more likely to be closely related to biological processes than they are to molecular functions. The findings have been supported by many works in the area of gene clustering. CONCLUSION: The best clustering algorithm of genes must achieve cohesiveness within a cluster with respect to some biological features, and as well as maximum separation between clusters in terms of the distribution of genes of a behavioral group across clusters. We claim that our proposed metric is novel in this respect and that it provides a measure of both inter and intra cluster cohesiveness. Best of all, computation of the proposed metric is easy and it provides a single quantitative value, which makes comparison of different algorithms easier. The maximum cluster cohesiveness and the maximum intra-cluster separation are indicated by the metric when its value is 0. We have demonstrated the metric by applying it to a data set with gene behavioral groupings such as biological process and molecular functions. The metric can be easily extended to other features of a gene such as DNA binding sites and protein-protein interactions of the gene product, special features of the intron-exon structure, promoter characteristics, etc. The metric can also be used in other domains that use two different parametric spaces; one for clustering and the other one for measuring the effectiveness. BioMed Central 2006-09-26 /pmc/articles/PMC1683560/ /pubmed/17118148 http://dx.doi.org/10.1186/1471-2105-7-S2-S5 Text en Copyright © 2006 Loganantharaj et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Loganantharaj, Raja Cheepala, Satish Clifford, John Metric for Measuring the Effectiveness of Clustering of DNA Microarray Expression
title	Metric for Measuring the Effectiveness of Clustering of DNA Microarray Expression
title_full	Metric for Measuring the Effectiveness of Clustering of DNA Microarray Expression
title_fullStr	Metric for Measuring the Effectiveness of Clustering of DNA Microarray Expression
title_full_unstemmed	Metric for Measuring the Effectiveness of Clustering of DNA Microarray Expression
title_short	Metric for Measuring the Effectiveness of Clustering of DNA Microarray Expression
title_sort	metric for measuring the effectiveness of clustering of dna microarray expression
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1683560/ https://www.ncbi.nlm.nih.gov/pubmed/17118148 http://dx.doi.org/10.1186/1471-2105-7-S2-S5
work_keys_str_mv	AT loganantharajraja metricformeasuringtheeffectivenessofclusteringofdnamicroarrayexpression AT cheepalasatish metricformeasuringtheeffectivenessofclusteringofdnamicroarrayexpression AT cliffordjohn metricformeasuringtheeffectivenessofclusteringofdnamicroarrayexpression

Metric for Measuring the Effectiveness of Clustering of DNA Microarray Expression

Ejemplares similares