Cargando…

Dynamic association rules for gene expression data analysis

BACKGROUND: The purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic v...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Shu-Chuan, Tsai, Tsung-Hsien, Chung, Cheng-Han, Li, Wen-Hsiung
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4606551/ https://www.ncbi.nlm.nih.gov/pubmed/26467206 http://dx.doi.org/10.1186/s12864-015-1970-x

_version_	1782395371509514240
author	Chen, Shu-Chuan Tsai, Tsung-Hsien Chung, Cheng-Han Li, Wen-Hsiung
author_facet	Chen, Shu-Chuan Tsai, Tsung-Hsien Chung, Cheng-Han Li, Wen-Hsiung
author_sort	Chen, Shu-Chuan
collection	PubMed
description	BACKGROUND: The purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic variations including cell regulations, clinical diagnoses and drug development. Statistical analyses on microarray data have been developed to resolve gene selection issue. However, these methods do not inform us of causality between genes and phenotypes. In this paper, we propose the dynamic association rule algorithm (DAR algorithm) which helps ones to efficiently select a subset of significant genes for subsequent analysis. The DAR algorithm is based on association rules from market basket analysis in marketing. We first propose a statistical way, based on constructing a one-sided confidence interval and hypothesis testing, to determine if an association rule is meaningful. Based on the proposed statistical method, we then developed the DAR algorithm for gene expression data analysis. The method was applied to analyze four microarray datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1 dataset, the whole genome expression dataset of mouse embryonic stem cells, expression profiling of the bone marrow of Leukemia patients, Microarray Quality Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting study. A comparison of the proposed method with the t-test on the expression profiling of the bone marrow of Leukemia patients was conducted. RESULTS: We developed a statistical way, based on the concept of confidence interval, to determine the minimum support and minimum confidence for mining association relationships among items. With the minimum support and minimum confidence, one can find significant rules in one single step. The DAR algorithm was then developed for gene expression data analysis. Four gene expression datasets showed that the proposed DAR algorithm not only was able to identify a set of differentially expressed genes that largely agreed with that of other methods, but also provided an efficient and accurate way to find influential genes of a disease. CONCLUSIONS: In the paper, the well-established association rule mining technique from marketing has been successfully modified to determine the minimum support and minimum confidence based on the concept of confidence interval and hypothesis testing. It can be applied to gene expression data to mine significant association rules between gene regulation and phenotype. The proposed DAR algorithm provides an efficient way to find influential genes that underlie the phenotypic variance.
format	Online Article Text
id	pubmed-4606551
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-46065512015-10-16 Dynamic association rules for gene expression data analysis Chen, Shu-Chuan Tsai, Tsung-Hsien Chung, Cheng-Han Li, Wen-Hsiung BMC Genomics Methodology Article BACKGROUND: The purpose of gene expression analysis is to look for the association between regulation of gene expression levels and phenotypic variations. This association based on gene expression profile has been used to determine whether the induction/repression of genes correspond to phenotypic variations including cell regulations, clinical diagnoses and drug development. Statistical analyses on microarray data have been developed to resolve gene selection issue. However, these methods do not inform us of causality between genes and phenotypes. In this paper, we propose the dynamic association rule algorithm (DAR algorithm) which helps ones to efficiently select a subset of significant genes for subsequent analysis. The DAR algorithm is based on association rules from market basket analysis in marketing. We first propose a statistical way, based on constructing a one-sided confidence interval and hypothesis testing, to determine if an association rule is meaningful. Based on the proposed statistical method, we then developed the DAR algorithm for gene expression data analysis. The method was applied to analyze four microarray datasets and one Next Generation Sequencing (NGS) dataset: the Mice Apo A1 dataset, the whole genome expression dataset of mouse embryonic stem cells, expression profiling of the bone marrow of Leukemia patients, Microarray Quality Control (MAQC) data set and the RNA-seq dataset of a mouse genomic imprinting study. A comparison of the proposed method with the t-test on the expression profiling of the bone marrow of Leukemia patients was conducted. RESULTS: We developed a statistical way, based on the concept of confidence interval, to determine the minimum support and minimum confidence for mining association relationships among items. With the minimum support and minimum confidence, one can find significant rules in one single step. The DAR algorithm was then developed for gene expression data analysis. Four gene expression datasets showed that the proposed DAR algorithm not only was able to identify a set of differentially expressed genes that largely agreed with that of other methods, but also provided an efficient and accurate way to find influential genes of a disease. CONCLUSIONS: In the paper, the well-established association rule mining technique from marketing has been successfully modified to determine the minimum support and minimum confidence based on the concept of confidence interval and hypothesis testing. It can be applied to gene expression data to mine significant association rules between gene regulation and phenotype. The proposed DAR algorithm provides an efficient way to find influential genes that underlie the phenotypic variance. BioMed Central 2015-10-14 /pmc/articles/PMC4606551/ /pubmed/26467206 http://dx.doi.org/10.1186/s12864-015-1970-x Text en © Chen et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Chen, Shu-Chuan Tsai, Tsung-Hsien Chung, Cheng-Han Li, Wen-Hsiung Dynamic association rules for gene expression data analysis
title	Dynamic association rules for gene expression data analysis
title_full	Dynamic association rules for gene expression data analysis
title_fullStr	Dynamic association rules for gene expression data analysis
title_full_unstemmed	Dynamic association rules for gene expression data analysis
title_short	Dynamic association rules for gene expression data analysis
title_sort	dynamic association rules for gene expression data analysis
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4606551/ https://www.ncbi.nlm.nih.gov/pubmed/26467206 http://dx.doi.org/10.1186/s12864-015-1970-x
work_keys_str_mv	AT chenshuchuan dynamicassociationrulesforgeneexpressiondataanalysis AT tsaitsunghsien dynamicassociationrulesforgeneexpressiondataanalysis AT chungchenghan dynamicassociationrulesforgeneexpressiondataanalysis AT liwenhsiung dynamicassociationrulesforgeneexpressiondataanalysis

Dynamic association rules for gene expression data analysis

Ejemplares similares