Cargando…
repo: an R package for data-centered management of bioinformatic pipelines
BACKGROUND: Reproducibility in Data Analysis research has long been a significant concern, particularly in the areas of Bioinformatics and Computational Biology. Towards the aim of developing reproducible and reusable processes, Data Analysis management tools can help giving structure and coherence...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5314482/ https://www.ncbi.nlm.nih.gov/pubmed/28209127 http://dx.doi.org/10.1186/s12859-017-1510-6 |
_version_ | 1782508528337944576 |
---|---|
author | Napolitano, Francesco |
author_facet | Napolitano, Francesco |
author_sort | Napolitano, Francesco |
collection | PubMed |
description | BACKGROUND: Reproducibility in Data Analysis research has long been a significant concern, particularly in the areas of Bioinformatics and Computational Biology. Towards the aim of developing reproducible and reusable processes, Data Analysis management tools can help giving structure and coherence to complex data flows. Nonetheless, improved software quality comes at the cost of additional design and planning effort, which may become impractical in rapidly changing development environments. I propose that an adjustment of focus from processes to data in the management of Bioinformatic pipelines may help improving reproducibility with minimal impact on preexisting development practices. RESULTS: In this paper I introduce the repo R package for bioinformatic analysis management. The tool supports a data-centered philosophy that aims at improving analysis reproducibility and reusability with minimal design overhead. The core of repo lies in its support for easy data storage, retrieval, distribution and annotation. In repo the data analysis flow is derived a posteriori from dependency annotations. CONCLUSIONS: The repo package constitutes an unobtrusive data and flow management extension of the R statistical language. Its adoption, together with good development practices, can help improving data analysis management, sharing and reproducibility, especially in the fields of Bioinformatics and Computational Biology. |
format | Online Article Text |
id | pubmed-5314482 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-53144822017-02-24 repo: an R package for data-centered management of bioinformatic pipelines Napolitano, Francesco BMC Bioinformatics Software BACKGROUND: Reproducibility in Data Analysis research has long been a significant concern, particularly in the areas of Bioinformatics and Computational Biology. Towards the aim of developing reproducible and reusable processes, Data Analysis management tools can help giving structure and coherence to complex data flows. Nonetheless, improved software quality comes at the cost of additional design and planning effort, which may become impractical in rapidly changing development environments. I propose that an adjustment of focus from processes to data in the management of Bioinformatic pipelines may help improving reproducibility with minimal impact on preexisting development practices. RESULTS: In this paper I introduce the repo R package for bioinformatic analysis management. The tool supports a data-centered philosophy that aims at improving analysis reproducibility and reusability with minimal design overhead. The core of repo lies in its support for easy data storage, retrieval, distribution and annotation. In repo the data analysis flow is derived a posteriori from dependency annotations. CONCLUSIONS: The repo package constitutes an unobtrusive data and flow management extension of the R statistical language. Its adoption, together with good development practices, can help improving data analysis management, sharing and reproducibility, especially in the fields of Bioinformatics and Computational Biology. BioMed Central 2017-02-16 /pmc/articles/PMC5314482/ /pubmed/28209127 http://dx.doi.org/10.1186/s12859-017-1510-6 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Napolitano, Francesco repo: an R package for data-centered management of bioinformatic pipelines |
title | repo: an R package for data-centered management of bioinformatic pipelines |
title_full | repo: an R package for data-centered management of bioinformatic pipelines |
title_fullStr | repo: an R package for data-centered management of bioinformatic pipelines |
title_full_unstemmed | repo: an R package for data-centered management of bioinformatic pipelines |
title_short | repo: an R package for data-centered management of bioinformatic pipelines |
title_sort | repo: an r package for data-centered management of bioinformatic pipelines |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5314482/ https://www.ncbi.nlm.nih.gov/pubmed/28209127 http://dx.doi.org/10.1186/s12859-017-1510-6 |
work_keys_str_mv | AT napolitanofrancesco repoanrpackagefordatacenteredmanagementofbioinformaticpipelines |