Cargando…

Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining

Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical bi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Maulik, Ujjwal, Mallik, Saurav, Mukhopadhyay, Anirban, Bandyopadhyay, Sanghamitra
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4382191/ https://www.ncbi.nlm.nih.gov/pubmed/25830807 http://dx.doi.org/10.1371/journal.pone.0119448

_version_	1782364568837685248
author	Maulik, Ujjwal Mallik, Saurav Mukhopadhyay, Anirban Bandyopadhyay, Sanghamitra
author_facet	Maulik, Ujjwal Mallik, Saurav Mukhopadhyay, Anirban Bandyopadhyay, Sanghamitra
author_sort	Maulik, Ujjwal
collection	PubMed
description	Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.
format	Online Article Text
id	pubmed-4382191
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-43821912015-04-09 Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining Maulik, Ujjwal Mallik, Saurav Mukhopadhyay, Anirban Bandyopadhyay, Sanghamitra PLoS One Research Article Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level. Public Library of Science 2015-04-01 /pmc/articles/PMC4382191/ /pubmed/25830807 http://dx.doi.org/10.1371/journal.pone.0119448 Text en © 2015 Maulik et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Maulik, Ujjwal Mallik, Saurav Mukhopadhyay, Anirban Bandyopadhyay, Sanghamitra Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining
title	Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining
title_full	Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining
title_fullStr	Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining
title_full_unstemmed	Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining
title_short	Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining
title_sort	analyzing large gene expression and methylation data profiles using statbicrm: statistical biclustering-based rule mining
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4382191/ https://www.ncbi.nlm.nih.gov/pubmed/25830807 http://dx.doi.org/10.1371/journal.pone.0119448
work_keys_str_mv	AT maulikujjwal analyzinglargegeneexpressionandmethylationdataprofilesusingstatbicrmstatisticalbiclusteringbasedrulemining AT malliksaurav analyzinglargegeneexpressionandmethylationdataprofilesusingstatbicrmstatisticalbiclusteringbasedrulemining AT mukhopadhyayanirban analyzinglargegeneexpressionandmethylationdataprofilesusingstatbicrmstatisticalbiclusteringbasedrulemining AT bandyopadhyaysanghamitra analyzinglargegeneexpressionandmethylationdataprofilesusingstatbicrmstatisticalbiclusteringbasedrulemining

Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining

Ejemplares similares