Cargando…

Constructing synthetic populations in the age of big data

BACKGROUND: To develop public health intervention models using micro-simulations, extensive personal information about inhabitants is needed, such as socio-demographic, economic and health figures. Confidentiality is an essential characteristic of such data, while the data should reflect realistic s...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nicolaie, Mioara A., Füssenich, Koen, Ameling, Caroline, Boshuizen, Hendriek C.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10617102/ https://www.ncbi.nlm.nih.gov/pubmed/37907904 http://dx.doi.org/10.1186/s12963-023-00319-5

_version_	1785129532740599808
author	Nicolaie, Mioara A. Füssenich, Koen Ameling, Caroline Boshuizen, Hendriek C.
author_facet	Nicolaie, Mioara A. Füssenich, Koen Ameling, Caroline Boshuizen, Hendriek C.
author_sort	Nicolaie, Mioara A.
collection	PubMed
description	BACKGROUND: To develop public health intervention models using micro-simulations, extensive personal information about inhabitants is needed, such as socio-demographic, economic and health figures. Confidentiality is an essential characteristic of such data, while the data should reflect realistic scenarios. Collection of such data is possible only in secured environments and not directly available for open-source micro-simulation models. The aim of this paper is to illustrate a method of construction of synthetic data by predicting individual features through models based on confidential data on health and socio-economic determinants of the entire Dutch population. METHODS: Administrative records and health registry data were linked to socio-economic characteristics and self-reported lifestyle factors. For the entire Dutch population (n = 16,778,708), all socio-demographic information except lifestyle factors was available. Lifestyle factors were available from the 2012 Dutch Health Monitor (n = 370,835). Regression model was used to sequentially predict individual features. RESULTS: The synthetic population resembles the original confidential population. Features predicted in the first stages of the sequential procedure are virtually similar to those in the original population, while those predicted in later stages of the sequential procedure carry the accumulation of limitations furthered by data quality and previously modelled features. CONCLUSIONS: By combining socio-demographic, economic, health and lifestyle related data at individual level on a large scale, our method provides us with a powerful tool to construct a synthetic population of good quality and with no confidentiality issues.
format	Online Article Text
id	pubmed-10617102
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-106171022023-11-01 Constructing synthetic populations in the age of big data Nicolaie, Mioara A. Füssenich, Koen Ameling, Caroline Boshuizen, Hendriek C. Popul Health Metr Research BACKGROUND: To develop public health intervention models using micro-simulations, extensive personal information about inhabitants is needed, such as socio-demographic, economic and health figures. Confidentiality is an essential characteristic of such data, while the data should reflect realistic scenarios. Collection of such data is possible only in secured environments and not directly available for open-source micro-simulation models. The aim of this paper is to illustrate a method of construction of synthetic data by predicting individual features through models based on confidential data on health and socio-economic determinants of the entire Dutch population. METHODS: Administrative records and health registry data were linked to socio-economic characteristics and self-reported lifestyle factors. For the entire Dutch population (n = 16,778,708), all socio-demographic information except lifestyle factors was available. Lifestyle factors were available from the 2012 Dutch Health Monitor (n = 370,835). Regression model was used to sequentially predict individual features. RESULTS: The synthetic population resembles the original confidential population. Features predicted in the first stages of the sequential procedure are virtually similar to those in the original population, while those predicted in later stages of the sequential procedure carry the accumulation of limitations furthered by data quality and previously modelled features. CONCLUSIONS: By combining socio-demographic, economic, health and lifestyle related data at individual level on a large scale, our method provides us with a powerful tool to construct a synthetic population of good quality and with no confidentiality issues. BioMed Central 2023-10-31 /pmc/articles/PMC10617102/ /pubmed/37907904 http://dx.doi.org/10.1186/s12963-023-00319-5 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Nicolaie, Mioara A. Füssenich, Koen Ameling, Caroline Boshuizen, Hendriek C. Constructing synthetic populations in the age of big data
title	Constructing synthetic populations in the age of big data
title_full	Constructing synthetic populations in the age of big data
title_fullStr	Constructing synthetic populations in the age of big data
title_full_unstemmed	Constructing synthetic populations in the age of big data
title_short	Constructing synthetic populations in the age of big data
title_sort	constructing synthetic populations in the age of big data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10617102/ https://www.ncbi.nlm.nih.gov/pubmed/37907904 http://dx.doi.org/10.1186/s12963-023-00319-5
work_keys_str_mv	AT nicolaiemioaraa constructingsyntheticpopulationsintheageofbigdata AT fussenichkoen constructingsyntheticpopulationsintheageofbigdata AT amelingcaroline constructingsyntheticpopulationsintheageofbigdata AT boshuizenhendriekc constructingsyntheticpopulationsintheageofbigdata

Constructing synthetic populations in the age of big data

Ejemplares similares