Cargando…
The reuse of public datasets in the life sciences: potential risks and rewards
The ‘big data’ revolution has enabled novel types of analyses in the life sciences, facilitated by public sharing and reuse of datasets. Here, we review the prodigious potential of reusing publicly available datasets and the associated challenges, limitations and risks. Possible solutions to issues...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7518187/ https://www.ncbi.nlm.nih.gov/pubmed/33024631 http://dx.doi.org/10.7717/peerj.9954 |
_version_ | 1783587355201896448 |
---|---|
author | Sielemann, Katharina Hafner, Alenka Pucker, Boas |
author_facet | Sielemann, Katharina Hafner, Alenka Pucker, Boas |
author_sort | Sielemann, Katharina |
collection | PubMed |
description | The ‘big data’ revolution has enabled novel types of analyses in the life sciences, facilitated by public sharing and reuse of datasets. Here, we review the prodigious potential of reusing publicly available datasets and the associated challenges, limitations and risks. Possible solutions to issues and research integrity considerations are also discussed. Due to the prominence, abundance and wide distribution of sequencing data, we focus on the reuse of publicly available sequence datasets. We define ‘successful reuse’ as the use of previously published data to enable novel scientific findings. By using selected examples of successful reuse from different disciplines, we illustrate the enormous potential of the practice, while acknowledging the respective limitations and risks. A checklist to determine the reuse value and potential of a particular dataset is also provided. The open discussion of data reuse and the establishment of this practice as a norm has the potential to benefit all stakeholders in the life sciences. |
format | Online Article Text |
id | pubmed-7518187 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-75181872020-10-05 The reuse of public datasets in the life sciences: potential risks and rewards Sielemann, Katharina Hafner, Alenka Pucker, Boas PeerJ Bioinformatics The ‘big data’ revolution has enabled novel types of analyses in the life sciences, facilitated by public sharing and reuse of datasets. Here, we review the prodigious potential of reusing publicly available datasets and the associated challenges, limitations and risks. Possible solutions to issues and research integrity considerations are also discussed. Due to the prominence, abundance and wide distribution of sequencing data, we focus on the reuse of publicly available sequence datasets. We define ‘successful reuse’ as the use of previously published data to enable novel scientific findings. By using selected examples of successful reuse from different disciplines, we illustrate the enormous potential of the practice, while acknowledging the respective limitations and risks. A checklist to determine the reuse value and potential of a particular dataset is also provided. The open discussion of data reuse and the establishment of this practice as a norm has the potential to benefit all stakeholders in the life sciences. PeerJ Inc. 2020-09-22 /pmc/articles/PMC7518187/ /pubmed/33024631 http://dx.doi.org/10.7717/peerj.9954 Text en © 2020 Sielemann et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Sielemann, Katharina Hafner, Alenka Pucker, Boas The reuse of public datasets in the life sciences: potential risks and rewards |
title | The reuse of public datasets in the life sciences: potential risks and rewards |
title_full | The reuse of public datasets in the life sciences: potential risks and rewards |
title_fullStr | The reuse of public datasets in the life sciences: potential risks and rewards |
title_full_unstemmed | The reuse of public datasets in the life sciences: potential risks and rewards |
title_short | The reuse of public datasets in the life sciences: potential risks and rewards |
title_sort | reuse of public datasets in the life sciences: potential risks and rewards |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7518187/ https://www.ncbi.nlm.nih.gov/pubmed/33024631 http://dx.doi.org/10.7717/peerj.9954 |
work_keys_str_mv | AT sielemannkatharina thereuseofpublicdatasetsinthelifesciencespotentialrisksandrewards AT hafneralenka thereuseofpublicdatasetsinthelifesciencespotentialrisksandrewards AT puckerboas thereuseofpublicdatasetsinthelifesciencespotentialrisksandrewards AT sielemannkatharina reuseofpublicdatasetsinthelifesciencespotentialrisksandrewards AT hafneralenka reuseofpublicdatasetsinthelifesciencespotentialrisksandrewards AT puckerboas reuseofpublicdatasetsinthelifesciencespotentialrisksandrewards |