Cargando…

CorrelaGenes: a new tool for the interpretation of the human transcriptome

BACKGROUND: The amount of gene expression data available in public repositories has grown exponentially in the last years, now requiring new data mining tools to transform them in information easily accessible to biologists. RESULTS: By exploiting expression data publicly available in the Gene Expre...

Descripción completa

Detalles Bibliográficos
Autores principales: Cremaschi, Paolo, Rovida, Sergio, Sacchi, Lucia, Lisa, Antonella, Calvi, Francesca, Montecucco, Alessandra, Biamonti, Giuseppe, Bione, Silvia, Sacchi, Gianni
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4016313/
https://www.ncbi.nlm.nih.gov/pubmed/24564370
http://dx.doi.org/10.1186/1471-2105-15-S1-S6
_version_ 1782315490315599872
author Cremaschi, Paolo
Rovida, Sergio
Sacchi, Lucia
Lisa, Antonella
Calvi, Francesca
Montecucco, Alessandra
Biamonti, Giuseppe
Bione, Silvia
Sacchi, Gianni
author_facet Cremaschi, Paolo
Rovida, Sergio
Sacchi, Lucia
Lisa, Antonella
Calvi, Francesca
Montecucco, Alessandra
Biamonti, Giuseppe
Bione, Silvia
Sacchi, Gianni
author_sort Cremaschi, Paolo
collection PubMed
description BACKGROUND: The amount of gene expression data available in public repositories has grown exponentially in the last years, now requiring new data mining tools to transform them in information easily accessible to biologists. RESULTS: By exploiting expression data publicly available in the Gene Expression Omnibus (GEO) database, we developed a new bioinformatics tool aimed at the identification of genes whose expression appeared simultaneously altered in different experimental conditions, thus suggesting co-regulation or coordinated action in the same biological process. To accomplish this task, we used the 978 human GEO Curated DataSets and we manually performed the selection of 2,109 pair-wise comparisons based on their biological rationale. The lists of differentially expressed genes, obtained from the selected comparisons, were stored in a PostgreSQL database and used as data source for the CorrelaGenes tool. Our application uses a customized Association Rule Mining (ARM) algorithm to identify sets of genes showing expression profiles correlated with a gene of interest. The significance of the correlation is measured coupling the Lift, a well-known standard ARM index, and the χ(2 )p value. The manually curated selection of the comparisons and the developed algorithm constitute a new approach in the field of gene expression profiling studies. Simulation performed on 100 randomly selected target genes allowed us to evaluate the efficiency of the procedure and to obtain preliminary data demonstrating the consistency of the results. CONCLUSIONS: The preliminary results of the simulation showed how CorrelaGenes could contribute to the characterization of molecular pathways and biological processes integrating data obtained from other applications and available in public repositories.
format Online
Article
Text
id pubmed-4016313
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40163132014-05-23 CorrelaGenes: a new tool for the interpretation of the human transcriptome Cremaschi, Paolo Rovida, Sergio Sacchi, Lucia Lisa, Antonella Calvi, Francesca Montecucco, Alessandra Biamonti, Giuseppe Bione, Silvia Sacchi, Gianni BMC Bioinformatics Software BACKGROUND: The amount of gene expression data available in public repositories has grown exponentially in the last years, now requiring new data mining tools to transform them in information easily accessible to biologists. RESULTS: By exploiting expression data publicly available in the Gene Expression Omnibus (GEO) database, we developed a new bioinformatics tool aimed at the identification of genes whose expression appeared simultaneously altered in different experimental conditions, thus suggesting co-regulation or coordinated action in the same biological process. To accomplish this task, we used the 978 human GEO Curated DataSets and we manually performed the selection of 2,109 pair-wise comparisons based on their biological rationale. The lists of differentially expressed genes, obtained from the selected comparisons, were stored in a PostgreSQL database and used as data source for the CorrelaGenes tool. Our application uses a customized Association Rule Mining (ARM) algorithm to identify sets of genes showing expression profiles correlated with a gene of interest. The significance of the correlation is measured coupling the Lift, a well-known standard ARM index, and the χ(2 )p value. The manually curated selection of the comparisons and the developed algorithm constitute a new approach in the field of gene expression profiling studies. Simulation performed on 100 randomly selected target genes allowed us to evaluate the efficiency of the procedure and to obtain preliminary data demonstrating the consistency of the results. CONCLUSIONS: The preliminary results of the simulation showed how CorrelaGenes could contribute to the characterization of molecular pathways and biological processes integrating data obtained from other applications and available in public repositories. BioMed Central 2014-01-10 /pmc/articles/PMC4016313/ /pubmed/24564370 http://dx.doi.org/10.1186/1471-2105-15-S1-S6 Text en Copyright © 2014 Cremaschi et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Cremaschi, Paolo
Rovida, Sergio
Sacchi, Lucia
Lisa, Antonella
Calvi, Francesca
Montecucco, Alessandra
Biamonti, Giuseppe
Bione, Silvia
Sacchi, Gianni
CorrelaGenes: a new tool for the interpretation of the human transcriptome
title CorrelaGenes: a new tool for the interpretation of the human transcriptome
title_full CorrelaGenes: a new tool for the interpretation of the human transcriptome
title_fullStr CorrelaGenes: a new tool for the interpretation of the human transcriptome
title_full_unstemmed CorrelaGenes: a new tool for the interpretation of the human transcriptome
title_short CorrelaGenes: a new tool for the interpretation of the human transcriptome
title_sort correlagenes: a new tool for the interpretation of the human transcriptome
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4016313/
https://www.ncbi.nlm.nih.gov/pubmed/24564370
http://dx.doi.org/10.1186/1471-2105-15-S1-S6
work_keys_str_mv AT cremaschipaolo correlagenesanewtoolfortheinterpretationofthehumantranscriptome
AT rovidasergio correlagenesanewtoolfortheinterpretationofthehumantranscriptome
AT sacchilucia correlagenesanewtoolfortheinterpretationofthehumantranscriptome
AT lisaantonella correlagenesanewtoolfortheinterpretationofthehumantranscriptome
AT calvifrancesca correlagenesanewtoolfortheinterpretationofthehumantranscriptome
AT montecuccoalessandra correlagenesanewtoolfortheinterpretationofthehumantranscriptome
AT biamontigiuseppe correlagenesanewtoolfortheinterpretationofthehumantranscriptome
AT bionesilvia correlagenesanewtoolfortheinterpretationofthehumantranscriptome
AT sacchigianni correlagenesanewtoolfortheinterpretationofthehumantranscriptome