Cargando…

The reuse of public datasets in the life sciences: potential risks and rewards

The ‘big data’ revolution has enabled novel types of analyses in the life sciences, facilitated by public sharing and reuse of datasets. Here, we review the prodigious potential of reusing publicly available datasets and the associated challenges, limitations and risks. Possible solutions to issues...

Descripción completa

Detalles Bibliográficos
Autores principales: Sielemann, Katharina, Hafner, Alenka, Pucker, Boas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7518187/
https://www.ncbi.nlm.nih.gov/pubmed/33024631
http://dx.doi.org/10.7717/peerj.9954
_version_ 1783587355201896448
author Sielemann, Katharina
Hafner, Alenka
Pucker, Boas
author_facet Sielemann, Katharina
Hafner, Alenka
Pucker, Boas
author_sort Sielemann, Katharina
collection PubMed
description The ‘big data’ revolution has enabled novel types of analyses in the life sciences, facilitated by public sharing and reuse of datasets. Here, we review the prodigious potential of reusing publicly available datasets and the associated challenges, limitations and risks. Possible solutions to issues and research integrity considerations are also discussed. Due to the prominence, abundance and wide distribution of sequencing data, we focus on the reuse of publicly available sequence datasets. We define ‘successful reuse’ as the use of previously published data to enable novel scientific findings. By using selected examples of successful reuse from different disciplines, we illustrate the enormous potential of the practice, while acknowledging the respective limitations and risks. A checklist to determine the reuse value and potential of a particular dataset is also provided. The open discussion of data reuse and the establishment of this practice as a norm has the potential to benefit all stakeholders in the life sciences.
format Online
Article
Text
id pubmed-7518187
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-75181872020-10-05 The reuse of public datasets in the life sciences: potential risks and rewards Sielemann, Katharina Hafner, Alenka Pucker, Boas PeerJ Bioinformatics The ‘big data’ revolution has enabled novel types of analyses in the life sciences, facilitated by public sharing and reuse of datasets. Here, we review the prodigious potential of reusing publicly available datasets and the associated challenges, limitations and risks. Possible solutions to issues and research integrity considerations are also discussed. Due to the prominence, abundance and wide distribution of sequencing data, we focus on the reuse of publicly available sequence datasets. We define ‘successful reuse’ as the use of previously published data to enable novel scientific findings. By using selected examples of successful reuse from different disciplines, we illustrate the enormous potential of the practice, while acknowledging the respective limitations and risks. A checklist to determine the reuse value and potential of a particular dataset is also provided. The open discussion of data reuse and the establishment of this practice as a norm has the potential to benefit all stakeholders in the life sciences. PeerJ Inc. 2020-09-22 /pmc/articles/PMC7518187/ /pubmed/33024631 http://dx.doi.org/10.7717/peerj.9954 Text en © 2020 Sielemann et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Sielemann, Katharina
Hafner, Alenka
Pucker, Boas
The reuse of public datasets in the life sciences: potential risks and rewards
title The reuse of public datasets in the life sciences: potential risks and rewards
title_full The reuse of public datasets in the life sciences: potential risks and rewards
title_fullStr The reuse of public datasets in the life sciences: potential risks and rewards
title_full_unstemmed The reuse of public datasets in the life sciences: potential risks and rewards
title_short The reuse of public datasets in the life sciences: potential risks and rewards
title_sort reuse of public datasets in the life sciences: potential risks and rewards
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7518187/
https://www.ncbi.nlm.nih.gov/pubmed/33024631
http://dx.doi.org/10.7717/peerj.9954
work_keys_str_mv AT sielemannkatharina thereuseofpublicdatasetsinthelifesciencespotentialrisksandrewards
AT hafneralenka thereuseofpublicdatasetsinthelifesciencespotentialrisksandrewards
AT puckerboas thereuseofpublicdatasetsinthelifesciencespotentialrisksandrewards
AT sielemannkatharina reuseofpublicdatasetsinthelifesciencespotentialrisksandrewards
AT hafneralenka reuseofpublicdatasetsinthelifesciencespotentialrisksandrewards
AT puckerboas reuseofpublicdatasetsinthelifesciencespotentialrisksandrewards