Cargando…

Streamlining data-intensive biology with workflow systems

As the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool...

Descripción completa

Detalles Bibliográficos
Autores principales: Reiter, Taylor, Brooks†, Phillip T, Irber†, Luiz, Joslin†, Shannon E K, Reid†, Charles M, Scott†, Camille, Brown, C Titus, Pierce-Ward, N Tessa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8631065/
https://www.ncbi.nlm.nih.gov/pubmed/33438730
http://dx.doi.org/10.1093/gigascience/giaa140
_version_ 1784607478335406080
author Reiter, Taylor
Brooks†, Phillip T
Irber†, Luiz
Joslin†, Shannon E K
Reid†, Charles M
Scott†, Camille
Brown, C Titus
Pierce-Ward, N Tessa
author_facet Reiter, Taylor
Brooks†, Phillip T
Irber†, Luiz
Joslin†, Shannon E K
Reid†, Charles M
Scott†, Camille
Brown, C Titus
Pierce-Ward, N Tessa
author_sort Reiter, Taylor
collection PubMed
description As the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. These workflows can produce hundreds to thousands of intermediate files and results that must be integrated for biological insight. Data-centric workflow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis and empowering researchers to conduct reproducible analyses at scale. Adoption of these tools can facilitate and expedite robust data analysis, but knowledge of these techniques is still lacking. Here, we provide a series of strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis. We present these practices in the context of high-throughput sequencing data analysis, but the principles are broadly applicable to biologists working beyond this field.
format Online
Article
Text
id pubmed-8631065
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-86310652021-12-01 Streamlining data-intensive biology with workflow systems Reiter, Taylor Brooks†, Phillip T Irber†, Luiz Joslin†, Shannon E K Reid†, Charles M Scott†, Camille Brown, C Titus Pierce-Ward, N Tessa Gigascience Review As the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. These workflows can produce hundreds to thousands of intermediate files and results that must be integrated for biological insight. Data-centric workflow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis and empowering researchers to conduct reproducible analyses at scale. Adoption of these tools can facilitate and expedite robust data analysis, but knowledge of these techniques is still lacking. Here, we provide a series of strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis. We present these practices in the context of high-throughput sequencing data analysis, but the principles are broadly applicable to biologists working beyond this field. Oxford University Press 2021-01-13 /pmc/articles/PMC8631065/ /pubmed/33438730 http://dx.doi.org/10.1093/gigascience/giaa140 Text en © The Author(s) 2021. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Review
Reiter, Taylor
Brooks†, Phillip T
Irber†, Luiz
Joslin†, Shannon E K
Reid†, Charles M
Scott†, Camille
Brown, C Titus
Pierce-Ward, N Tessa
Streamlining data-intensive biology with workflow systems
title Streamlining data-intensive biology with workflow systems
title_full Streamlining data-intensive biology with workflow systems
title_fullStr Streamlining data-intensive biology with workflow systems
title_full_unstemmed Streamlining data-intensive biology with workflow systems
title_short Streamlining data-intensive biology with workflow systems
title_sort streamlining data-intensive biology with workflow systems
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8631065/
https://www.ncbi.nlm.nih.gov/pubmed/33438730
http://dx.doi.org/10.1093/gigascience/giaa140
work_keys_str_mv AT reitertaylor streamliningdataintensivebiologywithworkflowsystems
AT brooksphillipt streamliningdataintensivebiologywithworkflowsystems
AT irberluiz streamliningdataintensivebiologywithworkflowsystems
AT joslinshannonek streamliningdataintensivebiologywithworkflowsystems
AT reidcharlesm streamliningdataintensivebiologywithworkflowsystems
AT scottcamille streamliningdataintensivebiologywithworkflowsystems
AT brownctitus streamliningdataintensivebiologywithworkflowsystems
AT piercewardntessa streamliningdataintensivebiologywithworkflowsystems