Cargando…

PiGx: reproducible genomics analysis pipelines with GNU Guix

In bioinformatics, as well as other computationally intensive research fields, there is a need for workflows that can reliably produce consistent output, from known sources, independent of the software environment or configuration settings of the machine on which they are executed. Indeed, this is e...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wurmus, Ricardo, Uyar, Bora, Osberg, Brendan, Franke, Vedran, Gosdschan, Alexander, Wreczycka, Katarzyna, Ronen, Jonathan, Akalin, Altuna
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2018
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6275446/ https://www.ncbi.nlm.nih.gov/pubmed/30277498 http://dx.doi.org/10.1093/gigascience/giy123

_version_	1783377814925344768
author	Wurmus, Ricardo Uyar, Bora Osberg, Brendan Franke, Vedran Gosdschan, Alexander Wreczycka, Katarzyna Ronen, Jonathan Akalin, Altuna
author_facet	Wurmus, Ricardo Uyar, Bora Osberg, Brendan Franke, Vedran Gosdschan, Alexander Wreczycka, Katarzyna Ronen, Jonathan Akalin, Altuna
author_sort	Wurmus, Ricardo
collection	PubMed
description	In bioinformatics, as well as other computationally intensive research fields, there is a need for workflows that can reliably produce consistent output, from known sources, independent of the software environment or configuration settings of the machine on which they are executed. Indeed, this is essential for controlled comparison between different observations and for the wider dissemination of workflows. However, providing this type of reproducibility and traceability is often complicated by the need to accommodate the myriad dependencies included in a larger body of software, each of which generally comes in various versions. Moreover, in many fields (bioinformatics being a prime example), these versions are subject to continual change due to rapidly evolving technologies, further complicating problems related to reproducibility. Here, we propose a principled approach for building analysis pipelines and managing their dependencies with GNU Guix. As a case study to demonstrate the utility of our approach, we present a set of highly reproducible pipelines called PiGx for the analysis of RNA sequencing, chromatin immunoprecipitation sequencing, bisulfite-treated DNA sequencing, and single-cell resolution RNA sequencing. All pipelines process raw experimental data and generate reports containing publication-ready plots and figures, with interactive report elements and standard observables. Users may install these highly reproducible packages and apply them to their own datasets without any special computational expertise beyond the use of the command line. We hope such a toolkit will provide immediate benefit to laboratory workers wishing to process their own datasets or bioinformaticians seeking to automate all, or parts of, their analyses. In the long term, we hope our approach to reproducibility will serve as a blueprint for reproducible workflows in other areas. Our pipelines, along with their corresponding documentation and sample reports, are available at http://bioinformatics.mdc-berlin.de/pigx
format	Online Article Text
id	pubmed-6275446
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-62754462018-12-06 PiGx: reproducible genomics analysis pipelines with GNU Guix Wurmus, Ricardo Uyar, Bora Osberg, Brendan Franke, Vedran Gosdschan, Alexander Wreczycka, Katarzyna Ronen, Jonathan Akalin, Altuna Gigascience Technical Note In bioinformatics, as well as other computationally intensive research fields, there is a need for workflows that can reliably produce consistent output, from known sources, independent of the software environment or configuration settings of the machine on which they are executed. Indeed, this is essential for controlled comparison between different observations and for the wider dissemination of workflows. However, providing this type of reproducibility and traceability is often complicated by the need to accommodate the myriad dependencies included in a larger body of software, each of which generally comes in various versions. Moreover, in many fields (bioinformatics being a prime example), these versions are subject to continual change due to rapidly evolving technologies, further complicating problems related to reproducibility. Here, we propose a principled approach for building analysis pipelines and managing their dependencies with GNU Guix. As a case study to demonstrate the utility of our approach, we present a set of highly reproducible pipelines called PiGx for the analysis of RNA sequencing, chromatin immunoprecipitation sequencing, bisulfite-treated DNA sequencing, and single-cell resolution RNA sequencing. All pipelines process raw experimental data and generate reports containing publication-ready plots and figures, with interactive report elements and standard observables. Users may install these highly reproducible packages and apply them to their own datasets without any special computational expertise beyond the use of the command line. We hope such a toolkit will provide immediate benefit to laboratory workers wishing to process their own datasets or bioinformaticians seeking to automate all, or parts of, their analyses. In the long term, we hope our approach to reproducibility will serve as a blueprint for reproducible workflows in other areas. Our pipelines, along with their corresponding documentation and sample reports, are available at http://bioinformatics.mdc-berlin.de/pigx Oxford University Press 2018-10-02 /pmc/articles/PMC6275446/ /pubmed/30277498 http://dx.doi.org/10.1093/gigascience/giy123 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Technical Note Wurmus, Ricardo Uyar, Bora Osberg, Brendan Franke, Vedran Gosdschan, Alexander Wreczycka, Katarzyna Ronen, Jonathan Akalin, Altuna PiGx: reproducible genomics analysis pipelines with GNU Guix
title	PiGx: reproducible genomics analysis pipelines with GNU Guix
title_full	PiGx: reproducible genomics analysis pipelines with GNU Guix
title_fullStr	PiGx: reproducible genomics analysis pipelines with GNU Guix
title_full_unstemmed	PiGx: reproducible genomics analysis pipelines with GNU Guix
title_short	PiGx: reproducible genomics analysis pipelines with GNU Guix
title_sort	pigx: reproducible genomics analysis pipelines with gnu guix
topic	Technical Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6275446/ https://www.ncbi.nlm.nih.gov/pubmed/30277498 http://dx.doi.org/10.1093/gigascience/giy123
work_keys_str_mv	AT wurmusricardo pigxreproduciblegenomicsanalysispipelineswithgnuguix AT uyarbora pigxreproduciblegenomicsanalysispipelineswithgnuguix AT osbergbrendan pigxreproduciblegenomicsanalysispipelineswithgnuguix AT frankevedran pigxreproduciblegenomicsanalysispipelineswithgnuguix AT gosdschanalexander pigxreproduciblegenomicsanalysispipelineswithgnuguix AT wreczyckakatarzyna pigxreproduciblegenomicsanalysispipelineswithgnuguix AT ronenjonathan pigxreproduciblegenomicsanalysispipelineswithgnuguix AT akalinaltuna pigxreproduciblegenomicsanalysispipelineswithgnuguix

PiGx: reproducible genomics analysis pipelines with GNU Guix

Ejemplares similares