Cargando…

A general concept for consistent documentation of computational analyses

The ever-growing amount of data in the field of life sciences demands standardized ways of high-throughput computational analysis. This standardization requires a thorough documentation of each step in the computational analysis to enable researchers to understand and reproduce the results. However,...

Descripción completa

Detalles Bibliográficos
Autores principales: Ebert, Peter, Müller, Fabian, Nordström, Karl, Lengauer, Thomas, Schulz, Marcel H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460408/
https://www.ncbi.nlm.nih.gov/pubmed/26055099
http://dx.doi.org/10.1093/database/bav050
_version_ 1782375381008908288
author Ebert, Peter
Müller, Fabian
Nordström, Karl
Lengauer, Thomas
Schulz, Marcel H.
author_facet Ebert, Peter
Müller, Fabian
Nordström, Karl
Lengauer, Thomas
Schulz, Marcel H.
author_sort Ebert, Peter
collection PubMed
description The ever-growing amount of data in the field of life sciences demands standardized ways of high-throughput computational analysis. This standardization requires a thorough documentation of each step in the computational analysis to enable researchers to understand and reproduce the results. However, due to the heterogeneity in software setups and the high rate of change during tool development, reproducibility is hard to achieve. One reason is that there is no common agreement in the research community on how to document computational studies. In many cases, simple flat files or other unstructured text documents are provided by researchers as documentation, which are often missing software dependencies, versions and sufficient documentation to understand the workflow and parameter settings. As a solution we suggest a simple and modest approach for documenting and verifying computational analysis pipelines. We propose a two-part scheme that defines a computational analysis using a Process and an Analysis metadata document, which jointly describe all necessary details to reproduce the results. In this design we separate the metadata specifying the process from the metadata describing an actual analysis run, thereby reducing the effort of manual documentation to an absolute minimum. Our approach is independent of a specific software environment, results in human readable XML documents that can easily be shared with other researchers and allows an automated validation to ensure consistency of the metadata. Because our approach has been designed with little to no assumptions concerning the workflow of an analysis, we expect it to be applicable in a wide range of computational research fields. Database URL: http://deep.mpi-inf.mpg.de/DAC/cmds/pub/pyvalid.zip
format Online
Article
Text
id pubmed-4460408
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-44604082015-06-11 A general concept for consistent documentation of computational analyses Ebert, Peter Müller, Fabian Nordström, Karl Lengauer, Thomas Schulz, Marcel H. Database (Oxford) Original Article The ever-growing amount of data in the field of life sciences demands standardized ways of high-throughput computational analysis. This standardization requires a thorough documentation of each step in the computational analysis to enable researchers to understand and reproduce the results. However, due to the heterogeneity in software setups and the high rate of change during tool development, reproducibility is hard to achieve. One reason is that there is no common agreement in the research community on how to document computational studies. In many cases, simple flat files or other unstructured text documents are provided by researchers as documentation, which are often missing software dependencies, versions and sufficient documentation to understand the workflow and parameter settings. As a solution we suggest a simple and modest approach for documenting and verifying computational analysis pipelines. We propose a two-part scheme that defines a computational analysis using a Process and an Analysis metadata document, which jointly describe all necessary details to reproduce the results. In this design we separate the metadata specifying the process from the metadata describing an actual analysis run, thereby reducing the effort of manual documentation to an absolute minimum. Our approach is independent of a specific software environment, results in human readable XML documents that can easily be shared with other researchers and allows an automated validation to ensure consistency of the metadata. Because our approach has been designed with little to no assumptions concerning the workflow of an analysis, we expect it to be applicable in a wide range of computational research fields. Database URL: http://deep.mpi-inf.mpg.de/DAC/cmds/pub/pyvalid.zip Oxford University Press 2015-06-08 /pmc/articles/PMC4460408/ /pubmed/26055099 http://dx.doi.org/10.1093/database/bav050 Text en © The Author(s) 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Ebert, Peter
Müller, Fabian
Nordström, Karl
Lengauer, Thomas
Schulz, Marcel H.
A general concept for consistent documentation of computational analyses
title A general concept for consistent documentation of computational analyses
title_full A general concept for consistent documentation of computational analyses
title_fullStr A general concept for consistent documentation of computational analyses
title_full_unstemmed A general concept for consistent documentation of computational analyses
title_short A general concept for consistent documentation of computational analyses
title_sort general concept for consistent documentation of computational analyses
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460408/
https://www.ncbi.nlm.nih.gov/pubmed/26055099
http://dx.doi.org/10.1093/database/bav050
work_keys_str_mv AT ebertpeter ageneralconceptforconsistentdocumentationofcomputationalanalyses
AT mullerfabian ageneralconceptforconsistentdocumentationofcomputationalanalyses
AT nordstromkarl ageneralconceptforconsistentdocumentationofcomputationalanalyses
AT lengauerthomas ageneralconceptforconsistentdocumentationofcomputationalanalyses
AT schulzmarcelh ageneralconceptforconsistentdocumentationofcomputationalanalyses
AT ebertpeter generalconceptforconsistentdocumentationofcomputationalanalyses
AT mullerfabian generalconceptforconsistentdocumentationofcomputationalanalyses
AT nordstromkarl generalconceptforconsistentdocumentationofcomputationalanalyses
AT lengauerthomas generalconceptforconsistentdocumentationofcomputationalanalyses
AT schulzmarcelh generalconceptforconsistentdocumentationofcomputationalanalyses