Cargando…

An overview of synthetic administrative data for research

Use of administrative data for research and for planning services has increased over recent decades due to the value of the large, rich information available. However, concerns about the release of sensitive or personal data and the associated disclosure risk can lead to lengthy approval processes a...

Descripción completa

Detalles Bibliográficos
Autores principales: Kokosi, Theodora, De Stavola, Bianca, Mitra, Robin, Frayling, Lora, Doherty, Aiden, Dove, Iain, Sonnenberg, Pam, Harron, Katie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Swansea University 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10464868/
https://www.ncbi.nlm.nih.gov/pubmed/37650026
http://dx.doi.org/10.23889/ijpds.v7i1.1727
_version_ 1785098558591991808
author Kokosi, Theodora
De Stavola, Bianca
Mitra, Robin
Frayling, Lora
Doherty, Aiden
Dove, Iain
Sonnenberg, Pam
Harron, Katie
author_facet Kokosi, Theodora
De Stavola, Bianca
Mitra, Robin
Frayling, Lora
Doherty, Aiden
Dove, Iain
Sonnenberg, Pam
Harron, Katie
author_sort Kokosi, Theodora
collection PubMed
description Use of administrative data for research and for planning services has increased over recent decades due to the value of the large, rich information available. However, concerns about the release of sensitive or personal data and the associated disclosure risk can lead to lengthy approval processes and restricted data access. This can delay or prevent the production of timely evidence. A promising solution to facilitate more efficient data access is to create synthetic versions of the original datasets which are less likely to hold confidential information and can minimise disclosure risk. Such data may be used as an interim solution, allowing researchers to develop their analysis plans on non-disclosive data, whilst waiting for access to the real data. We aim to provide an overview of the background and uses of synthetic data and describe common methods used to generate synthetic data in the context of UK administrative research. We propose a simplified terminology for categories of synthetic data (univariate, multivariate, and complex modality synthetic data) as well as a more comprehensive description of the terminology used in the existing literature and illustrate challenges and future directions for research.
format Online
Article
Text
id pubmed-10464868
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Swansea University
record_format MEDLINE/PubMed
spelling pubmed-104648682023-08-30 An overview of synthetic administrative data for research Kokosi, Theodora De Stavola, Bianca Mitra, Robin Frayling, Lora Doherty, Aiden Dove, Iain Sonnenberg, Pam Harron, Katie Int J Popul Data Sci Population Data Science Use of administrative data for research and for planning services has increased over recent decades due to the value of the large, rich information available. However, concerns about the release of sensitive or personal data and the associated disclosure risk can lead to lengthy approval processes and restricted data access. This can delay or prevent the production of timely evidence. A promising solution to facilitate more efficient data access is to create synthetic versions of the original datasets which are less likely to hold confidential information and can minimise disclosure risk. Such data may be used as an interim solution, allowing researchers to develop their analysis plans on non-disclosive data, whilst waiting for access to the real data. We aim to provide an overview of the background and uses of synthetic data and describe common methods used to generate synthetic data in the context of UK administrative research. We propose a simplified terminology for categories of synthetic data (univariate, multivariate, and complex modality synthetic data) as well as a more comprehensive description of the terminology used in the existing literature and illustrate challenges and future directions for research. Swansea University 2022-05-23 /pmc/articles/PMC10464868/ /pubmed/37650026 http://dx.doi.org/10.23889/ijpds.v7i1.1727 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
spellingShingle Population Data Science
Kokosi, Theodora
De Stavola, Bianca
Mitra, Robin
Frayling, Lora
Doherty, Aiden
Dove, Iain
Sonnenberg, Pam
Harron, Katie
An overview of synthetic administrative data for research
title An overview of synthetic administrative data for research
title_full An overview of synthetic administrative data for research
title_fullStr An overview of synthetic administrative data for research
title_full_unstemmed An overview of synthetic administrative data for research
title_short An overview of synthetic administrative data for research
title_sort overview of synthetic administrative data for research
topic Population Data Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10464868/
https://www.ncbi.nlm.nih.gov/pubmed/37650026
http://dx.doi.org/10.23889/ijpds.v7i1.1727
work_keys_str_mv AT kokositheodora anoverviewofsyntheticadministrativedataforresearch
AT destavolabianca anoverviewofsyntheticadministrativedataforresearch
AT mitrarobin anoverviewofsyntheticadministrativedataforresearch
AT fraylinglora anoverviewofsyntheticadministrativedataforresearch
AT dohertyaiden anoverviewofsyntheticadministrativedataforresearch
AT doveiain anoverviewofsyntheticadministrativedataforresearch
AT sonnenbergpam anoverviewofsyntheticadministrativedataforresearch
AT harronkatie anoverviewofsyntheticadministrativedataforresearch
AT kokositheodora overviewofsyntheticadministrativedataforresearch
AT destavolabianca overviewofsyntheticadministrativedataforresearch
AT mitrarobin overviewofsyntheticadministrativedataforresearch
AT fraylinglora overviewofsyntheticadministrativedataforresearch
AT dohertyaiden overviewofsyntheticadministrativedataforresearch
AT doveiain overviewofsyntheticadministrativedataforresearch
AT sonnenbergpam overviewofsyntheticadministrativedataforresearch
AT harronkatie overviewofsyntheticadministrativedataforresearch