Cargando…

A Link is not Enough – Reproducibility of Data

Although many works in the database community use open data in their experimental evaluation, repeating the empirical results of previous works remains a challenge. This holds true even if the source code or binaries of the tested algorithms are available. In this paper, we argue that providing acce...

Descripción completa

Detalles Bibliográficos
Autores principales: Pawlik, Mateusz, Hütter, Thomas, Kocher, Daniel, Mann, Willi, Augsten, Nikolaus
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6647556/
https://www.ncbi.nlm.nih.gov/pubmed/31402850
http://dx.doi.org/10.1007/s13222-019-00317-8
_version_ 1783437747971686400
author Pawlik, Mateusz
Hütter, Thomas
Kocher, Daniel
Mann, Willi
Augsten, Nikolaus
author_facet Pawlik, Mateusz
Hütter, Thomas
Kocher, Daniel
Mann, Willi
Augsten, Nikolaus
author_sort Pawlik, Mateusz
collection PubMed
description Although many works in the database community use open data in their experimental evaluation, repeating the empirical results of previous works remains a challenge. This holds true even if the source code or binaries of the tested algorithms are available. In this paper, we argue that providing access to the raw, original datasets is not enough. Real-world datasets are rarely processed without modification. Instead, the data is adapted to the needs of the experimental evaluation in the data preparation process. We showcase that the details of the data preparation process matter and subtle differences during data conversion can have a large impact on the outcome of runtime results. We introduce a data reproducibility model, identify three levels of data reproducibility, report about our own experience, and exemplify our best practices.
format Online
Article
Text
id pubmed-6647556
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-66475562019-08-09 A Link is not Enough – Reproducibility of Data Pawlik, Mateusz Hütter, Thomas Kocher, Daniel Mann, Willi Augsten, Nikolaus Datenbank Spektrum Schwerpunktbeitrag Although many works in the database community use open data in their experimental evaluation, repeating the empirical results of previous works remains a challenge. This holds true even if the source code or binaries of the tested algorithms are available. In this paper, we argue that providing access to the raw, original datasets is not enough. Real-world datasets are rarely processed without modification. Instead, the data is adapted to the needs of the experimental evaluation in the data preparation process. We showcase that the details of the data preparation process matter and subtle differences during data conversion can have a large impact on the outcome of runtime results. We introduce a data reproducibility model, identify three levels of data reproducibility, report about our own experience, and exemplify our best practices. Springer Berlin Heidelberg 2019-06-13 2019 /pmc/articles/PMC6647556/ /pubmed/31402850 http://dx.doi.org/10.1007/s13222-019-00317-8 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Schwerpunktbeitrag
Pawlik, Mateusz
Hütter, Thomas
Kocher, Daniel
Mann, Willi
Augsten, Nikolaus
A Link is not Enough – Reproducibility of Data
title A Link is not Enough – Reproducibility of Data
title_full A Link is not Enough – Reproducibility of Data
title_fullStr A Link is not Enough – Reproducibility of Data
title_full_unstemmed A Link is not Enough – Reproducibility of Data
title_short A Link is not Enough – Reproducibility of Data
title_sort link is not enough – reproducibility of data
topic Schwerpunktbeitrag
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6647556/
https://www.ncbi.nlm.nih.gov/pubmed/31402850
http://dx.doi.org/10.1007/s13222-019-00317-8
work_keys_str_mv AT pawlikmateusz alinkisnotenoughreproducibilityofdata
AT hutterthomas alinkisnotenoughreproducibilityofdata
AT kocherdaniel alinkisnotenoughreproducibilityofdata
AT mannwilli alinkisnotenoughreproducibilityofdata
AT augstennikolaus alinkisnotenoughreproducibilityofdata
AT pawlikmateusz linkisnotenoughreproducibilityofdata
AT hutterthomas linkisnotenoughreproducibilityofdata
AT kocherdaniel linkisnotenoughreproducibilityofdata
AT mannwilli linkisnotenoughreproducibilityofdata
AT augstennikolaus linkisnotenoughreproducibilityofdata