Cargando…

Workflows for microarray data processing in the Kepler environment

BACKGROUND: Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines...

Descripción completa

Detalles Bibliográficos
Autores principales: Stropp, Thomas, McPhillips, Timothy, Ludäscher, Bertram, Bieda, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431220/
https://www.ncbi.nlm.nih.gov/pubmed/22594911
http://dx.doi.org/10.1186/1471-2105-13-102
_version_ 1782242040426266624
author Stropp, Thomas
McPhillips, Timothy
Ludäscher, Bertram
Bieda, Mark
author_facet Stropp, Thomas
McPhillips, Timothy
Ludäscher, Bertram
Bieda, Mark
author_sort Stropp, Thomas
collection PubMed
description BACKGROUND: Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. RESULTS: We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems. CONCLUSIONS: We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services.
format Online
Article
Text
id pubmed-3431220
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34312202012-08-31 Workflows for microarray data processing in the Kepler environment Stropp, Thomas McPhillips, Timothy Ludäscher, Bertram Bieda, Mark BMC Bioinformatics Software BACKGROUND: Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. RESULTS: We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems. CONCLUSIONS: We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services. BioMed Central 2012-05-17 /pmc/articles/PMC3431220/ /pubmed/22594911 http://dx.doi.org/10.1186/1471-2105-13-102 Text en Copyright ©2012 Stropp et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Stropp, Thomas
McPhillips, Timothy
Ludäscher, Bertram
Bieda, Mark
Workflows for microarray data processing in the Kepler environment
title Workflows for microarray data processing in the Kepler environment
title_full Workflows for microarray data processing in the Kepler environment
title_fullStr Workflows for microarray data processing in the Kepler environment
title_full_unstemmed Workflows for microarray data processing in the Kepler environment
title_short Workflows for microarray data processing in the Kepler environment
title_sort workflows for microarray data processing in the kepler environment
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431220/
https://www.ncbi.nlm.nih.gov/pubmed/22594911
http://dx.doi.org/10.1186/1471-2105-13-102
work_keys_str_mv AT stroppthomas workflowsformicroarraydataprocessinginthekeplerenvironment
AT mcphillipstimothy workflowsformicroarraydataprocessinginthekeplerenvironment
AT ludascherbertram workflowsformicroarraydataprocessinginthekeplerenvironment
AT biedamark workflowsformicroarraydataprocessinginthekeplerenvironment