Cargando…

A Web-based and Grid-enabled dChip version for the analysis of large sets of gene expression data

BACKGROUND: Microarray techniques are one of the main methods used to investigate thousands of gene expression profiles for enlightening complex biological processes responsible for serious diseases, with a great scientific impact and a wide application area. Several standalone applications had been...

Descripción completa

Detalles Bibliográficos
Autores principales: Corradi, Luca, Fato, Marco, Porro, Ivan, Scaglione, Silvia, Torterolo, Livia
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2596147/
https://www.ncbi.nlm.nih.gov/pubmed/19014540
http://dx.doi.org/10.1186/1471-2105-9-480
_version_ 1782161829853659136
author Corradi, Luca
Fato, Marco
Porro, Ivan
Scaglione, Silvia
Torterolo, Livia
author_facet Corradi, Luca
Fato, Marco
Porro, Ivan
Scaglione, Silvia
Torterolo, Livia
author_sort Corradi, Luca
collection PubMed
description BACKGROUND: Microarray techniques are one of the main methods used to investigate thousands of gene expression profiles for enlightening complex biological processes responsible for serious diseases, with a great scientific impact and a wide application area. Several standalone applications had been developed in order to analyze microarray data. Two of the most known free analysis software packages are the R-based Bioconductor and dChip. The part of dChip software concerning the calculation and the analysis of gene expression has been modified to permit its execution on both cluster environments (supercomputers) and Grid infrastructures (distributed computing). This work is not aimed at replacing existing tools, but it provides researchers with a method to analyze large datasets without any hardware or software constraints. RESULTS: An application able to perform the computation and the analysis of gene expression on large datasets has been developed using algorithms provided by dChip. Different tests have been carried out in order to validate the results and to compare the performances obtained on different infrastructures. Validation tests have been performed using a small dataset related to the comparison of HUVEC (Human Umbilical Vein Endothelial Cells) and Fibroblasts, derived from same donors, treated with IFN-α. Moreover performance tests have been executed just to compare performances on different environments using a large dataset including about 1000 samples related to Breast Cancer patients. CONCLUSION: A Grid-enabled software application for the analysis of large Microarray datasets has been proposed. DChip software has been ported on Linux platform and modified, using appropriate parallelization strategies, to permit its execution on both cluster environments and Grid infrastructures. The added value provided by the use of Grid technologies is the possibility to exploit both computational and data Grid infrastructures to analyze large datasets of distributed data. The software has been validated and performances on cluster and Grid environments have been compared obtaining quite good scalability results.
format Text
id pubmed-2596147
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25961472008-12-05 A Web-based and Grid-enabled dChip version for the analysis of large sets of gene expression data Corradi, Luca Fato, Marco Porro, Ivan Scaglione, Silvia Torterolo, Livia BMC Bioinformatics Software BACKGROUND: Microarray techniques are one of the main methods used to investigate thousands of gene expression profiles for enlightening complex biological processes responsible for serious diseases, with a great scientific impact and a wide application area. Several standalone applications had been developed in order to analyze microarray data. Two of the most known free analysis software packages are the R-based Bioconductor and dChip. The part of dChip software concerning the calculation and the analysis of gene expression has been modified to permit its execution on both cluster environments (supercomputers) and Grid infrastructures (distributed computing). This work is not aimed at replacing existing tools, but it provides researchers with a method to analyze large datasets without any hardware or software constraints. RESULTS: An application able to perform the computation and the analysis of gene expression on large datasets has been developed using algorithms provided by dChip. Different tests have been carried out in order to validate the results and to compare the performances obtained on different infrastructures. Validation tests have been performed using a small dataset related to the comparison of HUVEC (Human Umbilical Vein Endothelial Cells) and Fibroblasts, derived from same donors, treated with IFN-α. Moreover performance tests have been executed just to compare performances on different environments using a large dataset including about 1000 samples related to Breast Cancer patients. CONCLUSION: A Grid-enabled software application for the analysis of large Microarray datasets has been proposed. DChip software has been ported on Linux platform and modified, using appropriate parallelization strategies, to permit its execution on both cluster environments and Grid infrastructures. The added value provided by the use of Grid technologies is the possibility to exploit both computational and data Grid infrastructures to analyze large datasets of distributed data. The software has been validated and performances on cluster and Grid environments have been compared obtaining quite good scalability results. BioMed Central 2008-11-13 /pmc/articles/PMC2596147/ /pubmed/19014540 http://dx.doi.org/10.1186/1471-2105-9-480 Text en Copyright © 2008 Corradi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Corradi, Luca
Fato, Marco
Porro, Ivan
Scaglione, Silvia
Torterolo, Livia
A Web-based and Grid-enabled dChip version for the analysis of large sets of gene expression data
title A Web-based and Grid-enabled dChip version for the analysis of large sets of gene expression data
title_full A Web-based and Grid-enabled dChip version for the analysis of large sets of gene expression data
title_fullStr A Web-based and Grid-enabled dChip version for the analysis of large sets of gene expression data
title_full_unstemmed A Web-based and Grid-enabled dChip version for the analysis of large sets of gene expression data
title_short A Web-based and Grid-enabled dChip version for the analysis of large sets of gene expression data
title_sort web-based and grid-enabled dchip version for the analysis of large sets of gene expression data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2596147/
https://www.ncbi.nlm.nih.gov/pubmed/19014540
http://dx.doi.org/10.1186/1471-2105-9-480
work_keys_str_mv AT corradiluca awebbasedandgridenableddchipversionfortheanalysisoflargesetsofgeneexpressiondata
AT fatomarco awebbasedandgridenableddchipversionfortheanalysisoflargesetsofgeneexpressiondata
AT porroivan awebbasedandgridenableddchipversionfortheanalysisoflargesetsofgeneexpressiondata
AT scaglionesilvia awebbasedandgridenableddchipversionfortheanalysisoflargesetsofgeneexpressiondata
AT torterololivia awebbasedandgridenableddchipversionfortheanalysisoflargesetsofgeneexpressiondata
AT corradiluca webbasedandgridenableddchipversionfortheanalysisoflargesetsofgeneexpressiondata
AT fatomarco webbasedandgridenableddchipversionfortheanalysisoflargesetsofgeneexpressiondata
AT porroivan webbasedandgridenableddchipversionfortheanalysisoflargesetsofgeneexpressiondata
AT scaglionesilvia webbasedandgridenableddchipversionfortheanalysisoflargesetsofgeneexpressiondata
AT torterololivia webbasedandgridenableddchipversionfortheanalysisoflargesetsofgeneexpressiondata