Cargando…

A Simple-to-Use R Package for Mimicking Study Data by Simulations

Background  Data protection policies might prohibit the transfer of existing study data to interested research groups. To overcome legal restrictions, simulated data can be transferred that mimic the structure but are different from the existing study data. Objectives  The aim of this work is to int...

Descripción completa

Detalles Bibliográficos
Autores principales: Koliopanos, Giorgos, Ojeda, Francisco, Ziegler, Andreas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Georg Thieme Verlag KG 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10462429/
https://www.ncbi.nlm.nih.gov/pubmed/36882158
http://dx.doi.org/10.1055/a-2048-7692
_version_ 1785098029502562304
author Koliopanos, Giorgos
Ojeda, Francisco
Ziegler, Andreas
author_facet Koliopanos, Giorgos
Ojeda, Francisco
Ziegler, Andreas
author_sort Koliopanos, Giorgos
collection PubMed
description Background  Data protection policies might prohibit the transfer of existing study data to interested research groups. To overcome legal restrictions, simulated data can be transferred that mimic the structure but are different from the existing study data. Objectives  The aim of this work is to introduce the simple-to-use R package Mock Data Generation (modgo) that may be used for simulating data from existing study data for continuous, ordinal categorical, and dichotomous variables. Methods  The core is to combine rank inverse normal transformation with the calculation of a correlation matrix for all variables. Data can then be simulated from a multivariate normal and transferred back to the original scale of the variables. Unique features of modgo are that it allows to change the correlation between variables, to perform perturbation analysis, to handle multicenter data, and to change inclusion/exclusion criteria by selecting specific values of one or a set of variables. Simulation studies on real data demonstrate the validity and flexibility of modgo. Results  modgo mimicked the structure of the original study data. Results of modgo were similar with those from two other existing packages in standard simulation scenarios. modgo's flexibility was demonstrated on several expansions. Conclusion  The R package modgo is useful when existing study data may not be shared. Its perturbation expansion permits to simulate truly anonymized subjects. The expansion to multicenter studies can be used for validating prediction models. Additional expansions can support the unraveling of associations even in large study data and can be useful in power calculations.
format Online
Article
Text
id pubmed-10462429
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Georg Thieme Verlag KG
record_format MEDLINE/PubMed
spelling pubmed-104624292023-08-29 A Simple-to-Use R Package for Mimicking Study Data by Simulations Koliopanos, Giorgos Ojeda, Francisco Ziegler, Andreas Methods Inf Med Background  Data protection policies might prohibit the transfer of existing study data to interested research groups. To overcome legal restrictions, simulated data can be transferred that mimic the structure but are different from the existing study data. Objectives  The aim of this work is to introduce the simple-to-use R package Mock Data Generation (modgo) that may be used for simulating data from existing study data for continuous, ordinal categorical, and dichotomous variables. Methods  The core is to combine rank inverse normal transformation with the calculation of a correlation matrix for all variables. Data can then be simulated from a multivariate normal and transferred back to the original scale of the variables. Unique features of modgo are that it allows to change the correlation between variables, to perform perturbation analysis, to handle multicenter data, and to change inclusion/exclusion criteria by selecting specific values of one or a set of variables. Simulation studies on real data demonstrate the validity and flexibility of modgo. Results  modgo mimicked the structure of the original study data. Results of modgo were similar with those from two other existing packages in standard simulation scenarios. modgo's flexibility was demonstrated on several expansions. Conclusion  The R package modgo is useful when existing study data may not be shared. Its perturbation expansion permits to simulate truly anonymized subjects. The expansion to multicenter studies can be used for validating prediction models. Additional expansions can support the unraveling of associations even in large study data and can be useful in power calculations. Georg Thieme Verlag KG 2023-04-11 /pmc/articles/PMC10462429/ /pubmed/36882158 http://dx.doi.org/10.1055/a-2048-7692 Text en The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. ( https://creativecommons.org/licenses/by-nc-nd/4.0/ ) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License, which permits unrestricted reproduction and distribution, for non-commercial purposes only; and use and reproduction, but not distribution, of adapted material for non-commercial purposes only, provided the original work is properly cited.
spellingShingle Koliopanos, Giorgos
Ojeda, Francisco
Ziegler, Andreas
A Simple-to-Use R Package for Mimicking Study Data by Simulations
title A Simple-to-Use R Package for Mimicking Study Data by Simulations
title_full A Simple-to-Use R Package for Mimicking Study Data by Simulations
title_fullStr A Simple-to-Use R Package for Mimicking Study Data by Simulations
title_full_unstemmed A Simple-to-Use R Package for Mimicking Study Data by Simulations
title_short A Simple-to-Use R Package for Mimicking Study Data by Simulations
title_sort simple-to-use r package for mimicking study data by simulations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10462429/
https://www.ncbi.nlm.nih.gov/pubmed/36882158
http://dx.doi.org/10.1055/a-2048-7692
work_keys_str_mv AT koliopanosgiorgos asimpletouserpackageformimickingstudydatabysimulations
AT ojedafrancisco asimpletouserpackageformimickingstudydatabysimulations
AT zieglerandreas asimpletouserpackageformimickingstudydatabysimulations
AT koliopanosgiorgos simpletouserpackageformimickingstudydatabysimulations
AT ojedafrancisco simpletouserpackageformimickingstudydatabysimulations
AT zieglerandreas simpletouserpackageformimickingstudydatabysimulations