Cargando…
A general concept for consistent documentation of computational analyses
The ever-growing amount of data in the field of life sciences demands standardized ways of high-throughput computational analysis. This standardization requires a thorough documentation of each step in the computational analysis to enable researchers to understand and reproduce the results. However,...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460408/ https://www.ncbi.nlm.nih.gov/pubmed/26055099 http://dx.doi.org/10.1093/database/bav050 |
_version_ | 1782375381008908288 |
---|---|
author | Ebert, Peter Müller, Fabian Nordström, Karl Lengauer, Thomas Schulz, Marcel H. |
author_facet | Ebert, Peter Müller, Fabian Nordström, Karl Lengauer, Thomas Schulz, Marcel H. |
author_sort | Ebert, Peter |
collection | PubMed |
description | The ever-growing amount of data in the field of life sciences demands standardized ways of high-throughput computational analysis. This standardization requires a thorough documentation of each step in the computational analysis to enable researchers to understand and reproduce the results. However, due to the heterogeneity in software setups and the high rate of change during tool development, reproducibility is hard to achieve. One reason is that there is no common agreement in the research community on how to document computational studies. In many cases, simple flat files or other unstructured text documents are provided by researchers as documentation, which are often missing software dependencies, versions and sufficient documentation to understand the workflow and parameter settings. As a solution we suggest a simple and modest approach for documenting and verifying computational analysis pipelines. We propose a two-part scheme that defines a computational analysis using a Process and an Analysis metadata document, which jointly describe all necessary details to reproduce the results. In this design we separate the metadata specifying the process from the metadata describing an actual analysis run, thereby reducing the effort of manual documentation to an absolute minimum. Our approach is independent of a specific software environment, results in human readable XML documents that can easily be shared with other researchers and allows an automated validation to ensure consistency of the metadata. Because our approach has been designed with little to no assumptions concerning the workflow of an analysis, we expect it to be applicable in a wide range of computational research fields. Database URL: http://deep.mpi-inf.mpg.de/DAC/cmds/pub/pyvalid.zip |
format | Online Article Text |
id | pubmed-4460408 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-44604082015-06-11 A general concept for consistent documentation of computational analyses Ebert, Peter Müller, Fabian Nordström, Karl Lengauer, Thomas Schulz, Marcel H. Database (Oxford) Original Article The ever-growing amount of data in the field of life sciences demands standardized ways of high-throughput computational analysis. This standardization requires a thorough documentation of each step in the computational analysis to enable researchers to understand and reproduce the results. However, due to the heterogeneity in software setups and the high rate of change during tool development, reproducibility is hard to achieve. One reason is that there is no common agreement in the research community on how to document computational studies. In many cases, simple flat files or other unstructured text documents are provided by researchers as documentation, which are often missing software dependencies, versions and sufficient documentation to understand the workflow and parameter settings. As a solution we suggest a simple and modest approach for documenting and verifying computational analysis pipelines. We propose a two-part scheme that defines a computational analysis using a Process and an Analysis metadata document, which jointly describe all necessary details to reproduce the results. In this design we separate the metadata specifying the process from the metadata describing an actual analysis run, thereby reducing the effort of manual documentation to an absolute minimum. Our approach is independent of a specific software environment, results in human readable XML documents that can easily be shared with other researchers and allows an automated validation to ensure consistency of the metadata. Because our approach has been designed with little to no assumptions concerning the workflow of an analysis, we expect it to be applicable in a wide range of computational research fields. Database URL: http://deep.mpi-inf.mpg.de/DAC/cmds/pub/pyvalid.zip Oxford University Press 2015-06-08 /pmc/articles/PMC4460408/ /pubmed/26055099 http://dx.doi.org/10.1093/database/bav050 Text en © The Author(s) 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Ebert, Peter Müller, Fabian Nordström, Karl Lengauer, Thomas Schulz, Marcel H. A general concept for consistent documentation of computational analyses |
title | A general concept for consistent documentation of computational analyses |
title_full | A general concept for consistent documentation of computational analyses |
title_fullStr | A general concept for consistent documentation of computational analyses |
title_full_unstemmed | A general concept for consistent documentation of computational analyses |
title_short | A general concept for consistent documentation of computational analyses |
title_sort | general concept for consistent documentation of computational analyses |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460408/ https://www.ncbi.nlm.nih.gov/pubmed/26055099 http://dx.doi.org/10.1093/database/bav050 |
work_keys_str_mv | AT ebertpeter ageneralconceptforconsistentdocumentationofcomputationalanalyses AT mullerfabian ageneralconceptforconsistentdocumentationofcomputationalanalyses AT nordstromkarl ageneralconceptforconsistentdocumentationofcomputationalanalyses AT lengauerthomas ageneralconceptforconsistentdocumentationofcomputationalanalyses AT schulzmarcelh ageneralconceptforconsistentdocumentationofcomputationalanalyses AT ebertpeter generalconceptforconsistentdocumentationofcomputationalanalyses AT mullerfabian generalconceptforconsistentdocumentationofcomputationalanalyses AT nordstromkarl generalconceptforconsistentdocumentationofcomputationalanalyses AT lengauerthomas generalconceptforconsistentdocumentationofcomputationalanalyses AT schulzmarcelh generalconceptforconsistentdocumentationofcomputationalanalyses |