Cargando…
Workflows for microarray data processing in the Kepler environment
BACKGROUND: Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431220/ https://www.ncbi.nlm.nih.gov/pubmed/22594911 http://dx.doi.org/10.1186/1471-2105-13-102 |
_version_ | 1782242040426266624 |
---|---|
author | Stropp, Thomas McPhillips, Timothy Ludäscher, Bertram Bieda, Mark |
author_facet | Stropp, Thomas McPhillips, Timothy Ludäscher, Bertram Bieda, Mark |
author_sort | Stropp, Thomas |
collection | PubMed |
description | BACKGROUND: Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. RESULTS: We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems. CONCLUSIONS: We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services. |
format | Online Article Text |
id | pubmed-3431220 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-34312202012-08-31 Workflows for microarray data processing in the Kepler environment Stropp, Thomas McPhillips, Timothy Ludäscher, Bertram Bieda, Mark BMC Bioinformatics Software BACKGROUND: Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services. RESULTS: We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems. CONCLUSIONS: We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services. BioMed Central 2012-05-17 /pmc/articles/PMC3431220/ /pubmed/22594911 http://dx.doi.org/10.1186/1471-2105-13-102 Text en Copyright ©2012 Stropp et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Stropp, Thomas McPhillips, Timothy Ludäscher, Bertram Bieda, Mark Workflows for microarray data processing in the Kepler environment |
title | Workflows for microarray data processing in the Kepler environment |
title_full | Workflows for microarray data processing in the Kepler environment |
title_fullStr | Workflows for microarray data processing in the Kepler environment |
title_full_unstemmed | Workflows for microarray data processing in the Kepler environment |
title_short | Workflows for microarray data processing in the Kepler environment |
title_sort | workflows for microarray data processing in the kepler environment |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431220/ https://www.ncbi.nlm.nih.gov/pubmed/22594911 http://dx.doi.org/10.1186/1471-2105-13-102 |
work_keys_str_mv | AT stroppthomas workflowsformicroarraydataprocessinginthekeplerenvironment AT mcphillipstimothy workflowsformicroarraydataprocessinginthekeplerenvironment AT ludascherbertram workflowsformicroarraydataprocessinginthekeplerenvironment AT biedamark workflowsformicroarraydataprocessinginthekeplerenvironment |