Cargando…

Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data

BACKGROUND: The association-rules discovery (ARD) technique has yet to be applied to gene-expression data analysis. Even in the absence of previous biological knowledge, it should identify sets of genes whose expression is correlated. The first association-rule miners appeared six years ago and prov...

Descripción completa

Detalles Bibliográficos
Autores principales: Becquet, Céline, Blachon, Sylvain, Jeudy, Baptiste, Boulicaut, Jean-Francois, Gandrillon, Olivier
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2002
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC151169/
https://www.ncbi.nlm.nih.gov/pubmed/12537556
_version_ 1782120659890995200
author Becquet, Céline
Blachon, Sylvain
Jeudy, Baptiste
Boulicaut, Jean-Francois
Gandrillon, Olivier
author_facet Becquet, Céline
Blachon, Sylvain
Jeudy, Baptiste
Boulicaut, Jean-Francois
Gandrillon, Olivier
author_sort Becquet, Céline
collection PubMed
description BACKGROUND: The association-rules discovery (ARD) technique has yet to be applied to gene-expression data analysis. Even in the absence of previous biological knowledge, it should identify sets of genes whose expression is correlated. The first association-rule miners appeared six years ago and proved efficient at dealing with sparse and weakly correlated data. A huge international research effort has led to new algorithms for tackling difficult contexts and these are particularly suited to analysis of large gene-expression matrices. To validate the ARD technique we have applied it to freely available human serial analysis of gene expression (SAGE) data. RESULTS: The approach described here enables us to designate sets of strong association rules. We normalized the SAGE data before applying our association rule miner. Depending on the discretization algorithm used, different properties of the data were highlighted. Both common and specific interpretations could be made from the extracted rules. In each and every case the extracted collections of rules indicated that a very strong co-regulation of mRNA encoding ribosomal proteins occurs in the dataset. Several rules associating proteins involved in signal transduction were obtained and analyzed, some pointing to yet-unexplored directions. Furthermore, by examining a subset of these rules, we were able both to reassign a wrongly labeled tag, and to propose a function for an expressed sequence tag encoding a protein of unknown function. CONCLUSIONS: We show that ARD is a promising technique that turns out to be complementary to existing gene-expression clustering techniques.
format Text
id pubmed-151169
institution National Center for Biotechnology Information
language English
publishDate 2002
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-1511692003-03-13 Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data Becquet, Céline Blachon, Sylvain Jeudy, Baptiste Boulicaut, Jean-Francois Gandrillon, Olivier Genome Biol Research BACKGROUND: The association-rules discovery (ARD) technique has yet to be applied to gene-expression data analysis. Even in the absence of previous biological knowledge, it should identify sets of genes whose expression is correlated. The first association-rule miners appeared six years ago and proved efficient at dealing with sparse and weakly correlated data. A huge international research effort has led to new algorithms for tackling difficult contexts and these are particularly suited to analysis of large gene-expression matrices. To validate the ARD technique we have applied it to freely available human serial analysis of gene expression (SAGE) data. RESULTS: The approach described here enables us to designate sets of strong association rules. We normalized the SAGE data before applying our association rule miner. Depending on the discretization algorithm used, different properties of the data were highlighted. Both common and specific interpretations could be made from the extracted rules. In each and every case the extracted collections of rules indicated that a very strong co-regulation of mRNA encoding ribosomal proteins occurs in the dataset. Several rules associating proteins involved in signal transduction were obtained and analyzed, some pointing to yet-unexplored directions. Furthermore, by examining a subset of these rules, we were able both to reassign a wrongly labeled tag, and to propose a function for an expressed sequence tag encoding a protein of unknown function. CONCLUSIONS: We show that ARD is a promising technique that turns out to be complementary to existing gene-expression clustering techniques. BioMed Central 2002 2002-11-21 /pmc/articles/PMC151169/ /pubmed/12537556 Text en Copyright © 2002 Becquet et al., licensee BioMed Central Ltd
spellingShingle Research
Becquet, Céline
Blachon, Sylvain
Jeudy, Baptiste
Boulicaut, Jean-Francois
Gandrillon, Olivier
Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data
title Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data
title_full Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data
title_fullStr Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data
title_full_unstemmed Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data
title_short Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data
title_sort strong-association-rule mining for large-scale gene-expression data analysis: a case study on human sage data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC151169/
https://www.ncbi.nlm.nih.gov/pubmed/12537556
work_keys_str_mv AT becquetceline strongassociationruleminingforlargescalegeneexpressiondataanalysisacasestudyonhumansagedata
AT blachonsylvain strongassociationruleminingforlargescalegeneexpressiondataanalysisacasestudyonhumansagedata
AT jeudybaptiste strongassociationruleminingforlargescalegeneexpressiondataanalysisacasestudyonhumansagedata
AT boulicautjeanfrancois strongassociationruleminingforlargescalegeneexpressiondataanalysisacasestudyonhumansagedata
AT gandrillonolivier strongassociationruleminingforlargescalegeneexpressiondataanalysisacasestudyonhumansagedata