Cargando…

Principles for data analysis workflows

A systematic and reproducible “workflow”—the process that moves a scientific investigation from raw data to coherent research question to insightful contribution—should be a fundamental part of academic data-intensive research practice. In this paper, we elaborate basic principles of a reproducible...

Descripción completa

Detalles Bibliográficos
Autores principales: Stoudt, Sara, Vásquez, Váleri N., Martinez, Ciera C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7971542/
https://www.ncbi.nlm.nih.gov/pubmed/33735208
http://dx.doi.org/10.1371/journal.pcbi.1008770
_version_ 1783666634572955648
author Stoudt, Sara
Vásquez, Váleri N.
Martinez, Ciera C.
author_facet Stoudt, Sara
Vásquez, Váleri N.
Martinez, Ciera C.
author_sort Stoudt, Sara
collection PubMed
description A systematic and reproducible “workflow”—the process that moves a scientific investigation from raw data to coherent research question to insightful contribution—should be a fundamental part of academic data-intensive research practice. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining 3 phases: the Explore, Refine, and Produce Phases. Each phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately communicated. Importantly, each phase can also give rise to a number of research products beyond traditional academic publications. Where relevant, we draw analogies between design principles and established practice in software development. The guidance provided here is not intended to be a strict rulebook; rather, the suggestions for practices and tools to advance reproducible, sound data-intensive analysis may furnish support for both students new to research and current researchers who are new to data-intensive work.
format Online
Article
Text
id pubmed-7971542
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-79715422021-03-31 Principles for data analysis workflows Stoudt, Sara Vásquez, Váleri N. Martinez, Ciera C. PLoS Comput Biol Education A systematic and reproducible “workflow”—the process that moves a scientific investigation from raw data to coherent research question to insightful contribution—should be a fundamental part of academic data-intensive research practice. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining 3 phases: the Explore, Refine, and Produce Phases. Each phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately communicated. Importantly, each phase can also give rise to a number of research products beyond traditional academic publications. Where relevant, we draw analogies between design principles and established practice in software development. The guidance provided here is not intended to be a strict rulebook; rather, the suggestions for practices and tools to advance reproducible, sound data-intensive analysis may furnish support for both students new to research and current researchers who are new to data-intensive work. Public Library of Science 2021-03-18 /pmc/articles/PMC7971542/ /pubmed/33735208 http://dx.doi.org/10.1371/journal.pcbi.1008770 Text en © 2021 Stoudt et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Education
Stoudt, Sara
Vásquez, Váleri N.
Martinez, Ciera C.
Principles for data analysis workflows
title Principles for data analysis workflows
title_full Principles for data analysis workflows
title_fullStr Principles for data analysis workflows
title_full_unstemmed Principles for data analysis workflows
title_short Principles for data analysis workflows
title_sort principles for data analysis workflows
topic Education
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7971542/
https://www.ncbi.nlm.nih.gov/pubmed/33735208
http://dx.doi.org/10.1371/journal.pcbi.1008770
work_keys_str_mv AT stoudtsara principlesfordataanalysisworkflows
AT vasquezvalerin principlesfordataanalysisworkflows
AT martinezcierac principlesfordataanalysisworkflows