Cargando…
Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data
BACKGROUND: There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the an...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2528018/ https://www.ncbi.nlm.nih.gov/pubmed/18687127 http://dx.doi.org/10.1186/1471-2105-9-334 |
_version_ | 1782158852707319808 |
---|---|
author | Li, Peter Castrillo, Juan I Velarde, Giles Wassink, Ingo Soiland-Reyes, Stian Owen, Stuart Withers, David Oinn, Tom Pocock, Matthew R Goble, Carole A Oliver, Stephen G Kell, Douglas B |
author_facet | Li, Peter Castrillo, Juan I Velarde, Giles Wassink, Ingo Soiland-Reyes, Stian Owen, Stuart Withers, David Oinn, Tom Pocock, Matthew R Goble, Carole A Oliver, Stephen G Kell, Douglas B |
author_sort | Li, Peter |
collection | PubMed |
description | BACKGROUND: There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. RESULTS: Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. CONCLUSION: Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data. |
format | Text |
id | pubmed-2528018 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-25280182008-09-03 Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data Li, Peter Castrillo, Juan I Velarde, Giles Wassink, Ingo Soiland-Reyes, Stian Owen, Stuart Withers, David Oinn, Tom Pocock, Matthew R Goble, Carole A Oliver, Stephen G Kell, Douglas B BMC Bioinformatics Software BACKGROUND: There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools. RESULTS: Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench. CONCLUSION: Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data. BioMed Central 2008-08-07 /pmc/articles/PMC2528018/ /pubmed/18687127 http://dx.doi.org/10.1186/1471-2105-9-334 Text en Copyright © 2008 Li et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Li, Peter Castrillo, Juan I Velarde, Giles Wassink, Ingo Soiland-Reyes, Stian Owen, Stuart Withers, David Oinn, Tom Pocock, Matthew R Goble, Carole A Oliver, Stephen G Kell, Douglas B Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data |
title | Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data |
title_full | Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data |
title_fullStr | Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data |
title_full_unstemmed | Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data |
title_short | Performing statistical analyses on quantitative data in Taverna workflows: An example using R and maxdBrowse to identify differentially-expressed genes from microarray data |
title_sort | performing statistical analyses on quantitative data in taverna workflows: an example using r and maxdbrowse to identify differentially-expressed genes from microarray data |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2528018/ https://www.ncbi.nlm.nih.gov/pubmed/18687127 http://dx.doi.org/10.1186/1471-2105-9-334 |
work_keys_str_mv | AT lipeter performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata AT castrillojuani performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata AT velardegiles performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata AT wassinkingo performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata AT soilandreyesstian performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata AT owenstuart performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata AT withersdavid performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata AT oinntom performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata AT pocockmatthewr performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata AT goblecarolea performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata AT oliverstepheng performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata AT kelldouglasb performingstatisticalanalysesonquantitativedataintavernaworkflowsanexampleusingrandmaxdbrowsetoidentifydifferentiallyexpressedgenesfrommicroarraydata |