Cargando…
A Link is not Enough – Reproducibility of Data
Although many works in the database community use open data in their experimental evaluation, repeating the empirical results of previous works remains a challenge. This holds true even if the source code or binaries of the tested algorithms are available. In this paper, we argue that providing acce...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Berlin Heidelberg
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6647556/ https://www.ncbi.nlm.nih.gov/pubmed/31402850 http://dx.doi.org/10.1007/s13222-019-00317-8 |
_version_ | 1783437747971686400 |
---|---|
author | Pawlik, Mateusz Hütter, Thomas Kocher, Daniel Mann, Willi Augsten, Nikolaus |
author_facet | Pawlik, Mateusz Hütter, Thomas Kocher, Daniel Mann, Willi Augsten, Nikolaus |
author_sort | Pawlik, Mateusz |
collection | PubMed |
description | Although many works in the database community use open data in their experimental evaluation, repeating the empirical results of previous works remains a challenge. This holds true even if the source code or binaries of the tested algorithms are available. In this paper, we argue that providing access to the raw, original datasets is not enough. Real-world datasets are rarely processed without modification. Instead, the data is adapted to the needs of the experimental evaluation in the data preparation process. We showcase that the details of the data preparation process matter and subtle differences during data conversion can have a large impact on the outcome of runtime results. We introduce a data reproducibility model, identify three levels of data reproducibility, report about our own experience, and exemplify our best practices. |
format | Online Article Text |
id | pubmed-6647556 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Springer Berlin Heidelberg |
record_format | MEDLINE/PubMed |
spelling | pubmed-66475562019-08-09 A Link is not Enough – Reproducibility of Data Pawlik, Mateusz Hütter, Thomas Kocher, Daniel Mann, Willi Augsten, Nikolaus Datenbank Spektrum Schwerpunktbeitrag Although many works in the database community use open data in their experimental evaluation, repeating the empirical results of previous works remains a challenge. This holds true even if the source code or binaries of the tested algorithms are available. In this paper, we argue that providing access to the raw, original datasets is not enough. Real-world datasets are rarely processed without modification. Instead, the data is adapted to the needs of the experimental evaluation in the data preparation process. We showcase that the details of the data preparation process matter and subtle differences during data conversion can have a large impact on the outcome of runtime results. We introduce a data reproducibility model, identify three levels of data reproducibility, report about our own experience, and exemplify our best practices. Springer Berlin Heidelberg 2019-06-13 2019 /pmc/articles/PMC6647556/ /pubmed/31402850 http://dx.doi.org/10.1007/s13222-019-00317-8 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. |
spellingShingle | Schwerpunktbeitrag Pawlik, Mateusz Hütter, Thomas Kocher, Daniel Mann, Willi Augsten, Nikolaus A Link is not Enough – Reproducibility of Data |
title | A Link is not Enough – Reproducibility of Data |
title_full | A Link is not Enough – Reproducibility of Data |
title_fullStr | A Link is not Enough – Reproducibility of Data |
title_full_unstemmed | A Link is not Enough – Reproducibility of Data |
title_short | A Link is not Enough – Reproducibility of Data |
title_sort | link is not enough – reproducibility of data |
topic | Schwerpunktbeitrag |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6647556/ https://www.ncbi.nlm.nih.gov/pubmed/31402850 http://dx.doi.org/10.1007/s13222-019-00317-8 |
work_keys_str_mv | AT pawlikmateusz alinkisnotenoughreproducibilityofdata AT hutterthomas alinkisnotenoughreproducibilityofdata AT kocherdaniel alinkisnotenoughreproducibilityofdata AT mannwilli alinkisnotenoughreproducibilityofdata AT augstennikolaus alinkisnotenoughreproducibilityofdata AT pawlikmateusz linkisnotenoughreproducibilityofdata AT hutterthomas linkisnotenoughreproducibilityofdata AT kocherdaniel linkisnotenoughreproducibilityofdata AT mannwilli linkisnotenoughreproducibilityofdata AT augstennikolaus linkisnotenoughreproducibilityofdata |