Cargando…

A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation

Open research data provide considerable scientific, societal, and economic benefits. However, disclosure risks can sometimes limit the sharing of open data, especially in datasets that include sensitive details or information from individuals with rare disorders. This article introduces the concept...

Descripción completa

Detalles Bibliográficos
Autor principal: Quintana, Daniel S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: eLife Sciences Publications, Ltd 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7112950/
https://www.ncbi.nlm.nih.gov/pubmed/32159513
http://dx.doi.org/10.7554/eLife.53275
_version_ 1783513578843668480
author Quintana, Daniel S
author_facet Quintana, Daniel S
author_sort Quintana, Daniel S
collection PubMed
description Open research data provide considerable scientific, societal, and economic benefits. However, disclosure risks can sometimes limit the sharing of open data, especially in datasets that include sensitive details or information from individuals with rare disorders. This article introduces the concept of synthetic datasets, which is an emerging method originally developed to permit the sharing of confidential census data. Synthetic datasets mimic real datasets by preserving their statistical properties and the relationships between variables. Importantly, this method also reduces disclosure risk to essentially nil as no record in the synthetic dataset represents a real individual. This practical guide with accompanying R script enables biobehavioural researchers to create synthetic datasets and assess their utility via the synthpop R package. By sharing synthetic datasets that mimic original datasets that could not otherwise be made open, researchers can ensure the reproducibility of their results and facilitate data exploration while maintaining participant privacy.
format Online
Article
Text
id pubmed-7112950
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher eLife Sciences Publications, Ltd
record_format MEDLINE/PubMed
spelling pubmed-71129502020-04-02 A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation Quintana, Daniel S eLife Human Biology and Medicine Open research data provide considerable scientific, societal, and economic benefits. However, disclosure risks can sometimes limit the sharing of open data, especially in datasets that include sensitive details or information from individuals with rare disorders. This article introduces the concept of synthetic datasets, which is an emerging method originally developed to permit the sharing of confidential census data. Synthetic datasets mimic real datasets by preserving their statistical properties and the relationships between variables. Importantly, this method also reduces disclosure risk to essentially nil as no record in the synthetic dataset represents a real individual. This practical guide with accompanying R script enables biobehavioural researchers to create synthetic datasets and assess their utility via the synthpop R package. By sharing synthetic datasets that mimic original datasets that could not otherwise be made open, researchers can ensure the reproducibility of their results and facilitate data exploration while maintaining participant privacy. eLife Sciences Publications, Ltd 2020-03-11 /pmc/articles/PMC7112950/ /pubmed/32159513 http://dx.doi.org/10.7554/eLife.53275 Text en © 2020, Quintana http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use and redistribution provided that the original author and source are credited.
spellingShingle Human Biology and Medicine
Quintana, Daniel S
A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation
title A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation
title_full A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation
title_fullStr A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation
title_full_unstemmed A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation
title_short A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation
title_sort synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation
topic Human Biology and Medicine
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7112950/
https://www.ncbi.nlm.nih.gov/pubmed/32159513
http://dx.doi.org/10.7554/eLife.53275
work_keys_str_mv AT quintanadaniels asyntheticdatasetprimerforthebiobehaviouralsciencestopromotereproducibilityandhypothesisgeneration
AT quintanadaniels syntheticdatasetprimerforthebiobehaviouralsciencestopromotereproducibilityandhypothesisgeneration