Cargando…

Association Rule Based Similarity Measures for the Clustering of Gene Expression Data

In life threatening diseases, such as cancer, where the effective diagnosis includes annotation, early detection, distinction, and prediction, data mining and statistical approaches offer the promise for precise, accurate, and functionally robust analysis of gene expression data. The computational e...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sethi, Prerna, Alagiriswamy, Sathya
Formato:	Texto
Lenguaje:	English
Publicado:	Bentham Open 2010
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3096052/ https://www.ncbi.nlm.nih.gov/pubmed/21603179 http://dx.doi.org/10.2174/1874431101004010063

_version_	1782203705100075008
author	Sethi, Prerna Alagiriswamy, Sathya
author_facet	Sethi, Prerna Alagiriswamy, Sathya
author_sort	Sethi, Prerna
collection	PubMed
description	In life threatening diseases, such as cancer, where the effective diagnosis includes annotation, early detection, distinction, and prediction, data mining and statistical approaches offer the promise for precise, accurate, and functionally robust analysis of gene expression data. The computational extraction of derived patterns from microarray gene expression is a non-trivial task that involves sophisticated algorithm design and analysis for specific domain discovery. In this paper, we have proposed a formal approach for feature extraction by first applying feature selection heuristics based on the statistical impurity measures, the Gini Index, Max Minority, and the Twoing Rule and obtaining the top 100-400 genes. We then analyze the associative dependencies between the genes and assign weights to the genes based on their degree of participation in the rules. Consequently, we present a weighted Jaccard and vector cosine similarity measure to compute the similarity between the discovered rules. Finally, we group the rules by applying hierarchical clustering. To demonstrate the usability and efficiency of the concept of our technique, we applied it to three publicly available, multiclass cancer gene expression datasets and performed a biomedical literature search to support the effectiveness of our results.
format	Text
id	pubmed-3096052
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	Bentham Open
record_format	MEDLINE/PubMed
spelling	pubmed-30960522011-05-20 Association Rule Based Similarity Measures for the Clustering of Gene Expression Data Sethi, Prerna Alagiriswamy, Sathya Open Med Inform J Article In life threatening diseases, such as cancer, where the effective diagnosis includes annotation, early detection, distinction, and prediction, data mining and statistical approaches offer the promise for precise, accurate, and functionally robust analysis of gene expression data. The computational extraction of derived patterns from microarray gene expression is a non-trivial task that involves sophisticated algorithm design and analysis for specific domain discovery. In this paper, we have proposed a formal approach for feature extraction by first applying feature selection heuristics based on the statistical impurity measures, the Gini Index, Max Minority, and the Twoing Rule and obtaining the top 100-400 genes. We then analyze the associative dependencies between the genes and assign weights to the genes based on their degree of participation in the rules. Consequently, we present a weighted Jaccard and vector cosine similarity measure to compute the similarity between the discovered rules. Finally, we group the rules by applying hierarchical clustering. To demonstrate the usability and efficiency of the concept of our technique, we applied it to three publicly available, multiclass cancer gene expression datasets and performed a biomedical literature search to support the effectiveness of our results. Bentham Open 2010-05-28 /pmc/articles/PMC3096052/ /pubmed/21603179 http://dx.doi.org/10.2174/1874431101004010063 Text en © Sethi and Alagiriswamy; Licensee Bentham Open. http://creativecommons.org/licenses/by-nc/3.0/ This is an open access article licensed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.
spellingShingle	Article Sethi, Prerna Alagiriswamy, Sathya Association Rule Based Similarity Measures for the Clustering of Gene Expression Data
title	Association Rule Based Similarity Measures for the Clustering of Gene Expression Data
title_full	Association Rule Based Similarity Measures for the Clustering of Gene Expression Data
title_fullStr	Association Rule Based Similarity Measures for the Clustering of Gene Expression Data
title_full_unstemmed	Association Rule Based Similarity Measures for the Clustering of Gene Expression Data
title_short	Association Rule Based Similarity Measures for the Clustering of Gene Expression Data
title_sort	association rule based similarity measures for the clustering of gene expression data
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3096052/ https://www.ncbi.nlm.nih.gov/pubmed/21603179 http://dx.doi.org/10.2174/1874431101004010063
work_keys_str_mv	AT sethiprerna associationrulebasedsimilaritymeasuresfortheclusteringofgeneexpressiondata AT alagiriswamysathya associationrulebasedsimilaritymeasuresfortheclusteringofgeneexpressiondata

Association Rule Based Similarity Measures for the Clustering of Gene Expression Data

Ejemplares similares