Cargando…
Dataset Reuse: Toward Translating Principles to Practice
The web provides access to millions of datasets that can have additional impact when used beyond their original context. We have little empirical insight into what makes a dataset more reusable than others and which of the existing guidelines and frameworks, if any, make a difference. In this paper,...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7691392/ https://www.ncbi.nlm.nih.gov/pubmed/33294873 http://dx.doi.org/10.1016/j.patter.2020.100136 |
_version_ | 1783614280517550080 |
---|---|
author | Koesten, Laura Vougiouklis, Pavlos Simperl, Elena Groth, Paul |
author_facet | Koesten, Laura Vougiouklis, Pavlos Simperl, Elena Groth, Paul |
author_sort | Koesten, Laura |
collection | PubMed |
description | The web provides access to millions of datasets that can have additional impact when used beyond their original context. We have little empirical insight into what makes a dataset more reusable than others and which of the existing guidelines and frameworks, if any, make a difference. In this paper, we explore potential reuse features through a literature review and present a case study on datasets on GitHub, a popular open platform for sharing code and data. We describe a corpus of more than 1.4 million data files, from over 65,000 repositories. Using GitHub's engagement metrics as proxies for dataset reuse, we relate them to reuse features from the literature and devise an initial model, using deep neural networks, to predict a dataset's reusability. This demonstrates the practical gap between principles and actionable insights that allow data publishers and tools designers to implement functionalities that provably facilitate reuse. |
format | Online Article Text |
id | pubmed-7691392 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-76913922020-12-07 Dataset Reuse: Toward Translating Principles to Practice Koesten, Laura Vougiouklis, Pavlos Simperl, Elena Groth, Paul Patterns (N Y) Article The web provides access to millions of datasets that can have additional impact when used beyond their original context. We have little empirical insight into what makes a dataset more reusable than others and which of the existing guidelines and frameworks, if any, make a difference. In this paper, we explore potential reuse features through a literature review and present a case study on datasets on GitHub, a popular open platform for sharing code and data. We describe a corpus of more than 1.4 million data files, from over 65,000 repositories. Using GitHub's engagement metrics as proxies for dataset reuse, we relate them to reuse features from the literature and devise an initial model, using deep neural networks, to predict a dataset's reusability. This demonstrates the practical gap between principles and actionable insights that allow data publishers and tools designers to implement functionalities that provably facilitate reuse. Elsevier 2020-11-04 /pmc/articles/PMC7691392/ /pubmed/33294873 http://dx.doi.org/10.1016/j.patter.2020.100136 Text en © 2020 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Koesten, Laura Vougiouklis, Pavlos Simperl, Elena Groth, Paul Dataset Reuse: Toward Translating Principles to Practice |
title | Dataset Reuse: Toward Translating Principles to Practice |
title_full | Dataset Reuse: Toward Translating Principles to Practice |
title_fullStr | Dataset Reuse: Toward Translating Principles to Practice |
title_full_unstemmed | Dataset Reuse: Toward Translating Principles to Practice |
title_short | Dataset Reuse: Toward Translating Principles to Practice |
title_sort | dataset reuse: toward translating principles to practice |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7691392/ https://www.ncbi.nlm.nih.gov/pubmed/33294873 http://dx.doi.org/10.1016/j.patter.2020.100136 |
work_keys_str_mv | AT koestenlaura datasetreusetowardtranslatingprinciplestopractice AT vougiouklispavlos datasetreusetowardtranslatingprinciplestopractice AT simperlelena datasetreusetowardtranslatingprinciplestopractice AT grothpaul datasetreusetowardtranslatingprinciplestopractice |