Cargando…

Application of Gene Shaving and Mixture Models to Cluster Microarray Gene Expression Data

Researchers are frequently faced with the analysis of microarray data of a relatively large number of genes using a small number of tissue samples. We examine the application of two statistical methods for clustering such microarray expression data: EMMIX-GENE and GeneClust. EMMIX-GENE is a mixture-...

Descripción completa

Detalles Bibliográficos
Autores principales: Do, K-A., McLachlan, G.J., Bean, R., Wen, S.
Formato: Texto
Lenguaje:English
Publicado: Libertas Academica 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2666952/
https://www.ncbi.nlm.nih.gov/pubmed/19390667
_version_ 1782166090469605376
author Do, K-A.
McLachlan, G.J.
Bean, R.
Wen, S.
author_facet Do, K-A.
McLachlan, G.J.
Bean, R.
Wen, S.
author_sort Do, K-A.
collection PubMed
description Researchers are frequently faced with the analysis of microarray data of a relatively large number of genes using a small number of tissue samples. We examine the application of two statistical methods for clustering such microarray expression data: EMMIX-GENE and GeneClust. EMMIX-GENE is a mixture-model based clustering approach, designed primarily to cluster tissue samples on the basis of the genes. GeneClust is an implementation of the gene shaving methodology, motivated by research to identify distinct sets of genes for which variation in expression could be related to a biological property of the tissue samples. We illustrate the use of these two methods in the analysis of Affymetrix oligonucleotide arrays of well-known data sets from colon tissue samples with and without tumors, and of tumor tissue samples from patients with leukemia. Although the two approaches have been developed from different perspectives, the results demonstrate a clear correspondence between gene clusters produced by GeneClust and EMMIX-GENE for the colon tissue data. It is demonstrated, for the case of ribosomal proteins and smooth muscle genes in the colon data set, that both methods can classify genes into co-regulated families. It is further demonstrated that tissue types (tumor and normal) can be separated on the basis of subtle distributed patterns of genes. Application to the leukemia tissue data produces a division of tissues corresponding closely to the external classification, acute myeloid meukemia (AML) and acute lymphoblastic leukemia (ALL), for both methods. In addition, we also identify genes specific for the subgroup of ALL-Tcell samples. Overall, we find that the gene shaving method produces gene clusters at great speed; allows variable cluster sizes and can incorporate partial or full supervision; and finds clusters of genes in which the gene expression varies greatly over the tissue samples while maintaining a high level of coherence between the gene expression profiles. The intent of the EMMIX-GENE method is to cluster the tissue samples. It performs a filtering step that results in a subset of relevant genes, followed by gene clustering, and then tissue clustering, and is favorable in its accuracy of ranking the clusters produced.
format Text
id pubmed-2666952
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-26669522009-04-22 Application of Gene Shaving and Mixture Models to Cluster Microarray Gene Expression Data Do, K-A. McLachlan, G.J. Bean, R. Wen, S. Cancer Inform Systems Biology Special Issue Researchers are frequently faced with the analysis of microarray data of a relatively large number of genes using a small number of tissue samples. We examine the application of two statistical methods for clustering such microarray expression data: EMMIX-GENE and GeneClust. EMMIX-GENE is a mixture-model based clustering approach, designed primarily to cluster tissue samples on the basis of the genes. GeneClust is an implementation of the gene shaving methodology, motivated by research to identify distinct sets of genes for which variation in expression could be related to a biological property of the tissue samples. We illustrate the use of these two methods in the analysis of Affymetrix oligonucleotide arrays of well-known data sets from colon tissue samples with and without tumors, and of tumor tissue samples from patients with leukemia. Although the two approaches have been developed from different perspectives, the results demonstrate a clear correspondence between gene clusters produced by GeneClust and EMMIX-GENE for the colon tissue data. It is demonstrated, for the case of ribosomal proteins and smooth muscle genes in the colon data set, that both methods can classify genes into co-regulated families. It is further demonstrated that tissue types (tumor and normal) can be separated on the basis of subtle distributed patterns of genes. Application to the leukemia tissue data produces a division of tissues corresponding closely to the external classification, acute myeloid meukemia (AML) and acute lymphoblastic leukemia (ALL), for both methods. In addition, we also identify genes specific for the subgroup of ALL-Tcell samples. Overall, we find that the gene shaving method produces gene clusters at great speed; allows variable cluster sizes and can incorporate partial or full supervision; and finds clusters of genes in which the gene expression varies greatly over the tissue samples while maintaining a high level of coherence between the gene expression profiles. The intent of the EMMIX-GENE method is to cluster the tissue samples. It performs a filtering step that results in a subset of relevant genes, followed by gene clustering, and then tissue clustering, and is favorable in its accuracy of ranking the clusters produced. Libertas Academica 2007-04-02 /pmc/articles/PMC2666952/ /pubmed/19390667 Text en © 2007 The authors. http://creativecommons.org/licenses/by/3.0 This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
spellingShingle Systems Biology Special Issue
Do, K-A.
McLachlan, G.J.
Bean, R.
Wen, S.
Application of Gene Shaving and Mixture Models to Cluster Microarray Gene Expression Data
title Application of Gene Shaving and Mixture Models to Cluster Microarray Gene Expression Data
title_full Application of Gene Shaving and Mixture Models to Cluster Microarray Gene Expression Data
title_fullStr Application of Gene Shaving and Mixture Models to Cluster Microarray Gene Expression Data
title_full_unstemmed Application of Gene Shaving and Mixture Models to Cluster Microarray Gene Expression Data
title_short Application of Gene Shaving and Mixture Models to Cluster Microarray Gene Expression Data
title_sort application of gene shaving and mixture models to cluster microarray gene expression data
topic Systems Biology Special Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2666952/
https://www.ncbi.nlm.nih.gov/pubmed/19390667
work_keys_str_mv AT doka applicationofgeneshavingandmixturemodelstoclustermicroarraygeneexpressiondata
AT mclachlangj applicationofgeneshavingandmixturemodelstoclustermicroarraygeneexpressiondata
AT beanr applicationofgeneshavingandmixturemodelstoclustermicroarraygeneexpressiondata
AT wens applicationofgeneshavingandmixturemodelstoclustermicroarraygeneexpressiondata