Cargando…

Investigating reproducibility and tracking provenance – A genomic workflow case study

BACKGROUND: Computational bioinformatics workflows are extensively used to analyse genomics data, with different approaches available to support implementation and execution of these workflows. Reproducibility is one of the core principles for any scientific workflow and remains a challenge, which i...

Descripción completa

Detalles Bibliográficos
Autores principales: Kanwal, Sehrish, Khan, Farah Zaib, Lonie, Andrew, Sinnott, Richard O.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5508699/
https://www.ncbi.nlm.nih.gov/pubmed/28701218
http://dx.doi.org/10.1186/s12859-017-1747-0
_version_ 1783249922375548928
author Kanwal, Sehrish
Khan, Farah Zaib
Lonie, Andrew
Sinnott, Richard O.
author_facet Kanwal, Sehrish
Khan, Farah Zaib
Lonie, Andrew
Sinnott, Richard O.
author_sort Kanwal, Sehrish
collection PubMed
description BACKGROUND: Computational bioinformatics workflows are extensively used to analyse genomics data, with different approaches available to support implementation and execution of these workflows. Reproducibility is one of the core principles for any scientific workflow and remains a challenge, which is not fully addressed. This is due to incomplete understanding of reproducibility requirements and assumptions of workflow definition approaches. Provenance information should be tracked and used to capture all these requirements supporting reusability of existing workflows. RESULTS: We have implemented a complex but widely deployed bioinformatics workflow using three representative approaches to workflow definition and execution. Through implementation, we identified assumptions implicit in these approaches that ultimately produce insufficient documentation of workflow requirements resulting in failed execution of the workflow. This study proposes a set of recommendations that aims to mitigate these assumptions and guides the scientific community to accomplish reproducible science, hence addressing reproducibility crisis. CONCLUSIONS: Reproducing, adapting or even repeating a bioinformatics workflow in any environment requires substantial technical knowledge of the workflow execution environment, resolving analysis assumptions and rigorous compliance with reproducibility requirements. Towards these goals, we propose conclusive recommendations that along with an explicit declaration of workflow specification would result in enhanced reproducibility of computational genomic analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1747-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5508699
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55086992017-07-17 Investigating reproducibility and tracking provenance – A genomic workflow case study Kanwal, Sehrish Khan, Farah Zaib Lonie, Andrew Sinnott, Richard O. BMC Bioinformatics Research Article BACKGROUND: Computational bioinformatics workflows are extensively used to analyse genomics data, with different approaches available to support implementation and execution of these workflows. Reproducibility is one of the core principles for any scientific workflow and remains a challenge, which is not fully addressed. This is due to incomplete understanding of reproducibility requirements and assumptions of workflow definition approaches. Provenance information should be tracked and used to capture all these requirements supporting reusability of existing workflows. RESULTS: We have implemented a complex but widely deployed bioinformatics workflow using three representative approaches to workflow definition and execution. Through implementation, we identified assumptions implicit in these approaches that ultimately produce insufficient documentation of workflow requirements resulting in failed execution of the workflow. This study proposes a set of recommendations that aims to mitigate these assumptions and guides the scientific community to accomplish reproducible science, hence addressing reproducibility crisis. CONCLUSIONS: Reproducing, adapting or even repeating a bioinformatics workflow in any environment requires substantial technical knowledge of the workflow execution environment, resolving analysis assumptions and rigorous compliance with reproducibility requirements. Towards these goals, we propose conclusive recommendations that along with an explicit declaration of workflow specification would result in enhanced reproducibility of computational genomic analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1747-0) contains supplementary material, which is available to authorized users. BioMed Central 2017-07-12 /pmc/articles/PMC5508699/ /pubmed/28701218 http://dx.doi.org/10.1186/s12859-017-1747-0 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Kanwal, Sehrish
Khan, Farah Zaib
Lonie, Andrew
Sinnott, Richard O.
Investigating reproducibility and tracking provenance – A genomic workflow case study
title Investigating reproducibility and tracking provenance – A genomic workflow case study
title_full Investigating reproducibility and tracking provenance – A genomic workflow case study
title_fullStr Investigating reproducibility and tracking provenance – A genomic workflow case study
title_full_unstemmed Investigating reproducibility and tracking provenance – A genomic workflow case study
title_short Investigating reproducibility and tracking provenance – A genomic workflow case study
title_sort investigating reproducibility and tracking provenance – a genomic workflow case study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5508699/
https://www.ncbi.nlm.nih.gov/pubmed/28701218
http://dx.doi.org/10.1186/s12859-017-1747-0
work_keys_str_mv AT kanwalsehrish investigatingreproducibilityandtrackingprovenanceagenomicworkflowcasestudy
AT khanfarahzaib investigatingreproducibilityandtrackingprovenanceagenomicworkflowcasestudy
AT lonieandrew investigatingreproducibilityandtrackingprovenanceagenomicworkflowcasestudy
AT sinnottrichardo investigatingreproducibilityandtrackingprovenanceagenomicworkflowcasestudy