Cargando…
Microsimulation of an educational attainment register to predict future record linkage quality
INTRODUCTION: Population wide educational attainment registers are necessary for educational planning and research. Regular linking of databases is needed to build and update such a register. Without availability of unique national identification numbers, record linkage must be based on quasi-identi...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Swansea University
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10463005/ https://www.ncbi.nlm.nih.gov/pubmed/37649490 http://dx.doi.org/10.23889/ijpds.v8i1.2122 |
_version_ | 1785098157716144128 |
---|---|
author | Schnell, Rainer Weiand, Severin |
author_facet | Schnell, Rainer Weiand, Severin |
author_sort | Schnell, Rainer |
collection | PubMed |
description | INTRODUCTION: Population wide educational attainment registers are necessary for educational planning and research. Regular linking of databases is needed to build and update such a register. Without availability of unique national identification numbers, record linkage must be based on quasi-identifiers such as name, date of birth and sex. However, the data protection principle of data minimization aims to minimize the set of identifiers in databases. OBJECTIVES: Therefore, the German Federal Ministry of Research and Education commissioned a study to inform legislation on the minimum set of identifiers required for a national educational register. METHODS: To justify our recommendations empirically, we implemented a microsimulation of about 20 million people. The simulated register accumulates changes and errors in identifiers due to migration, regional mobility, marriage, school career and mortality, thereby allowing the study of errors on longitudinal datasets. Updated records were linked yearly to the simulated register using several linkage methods. Clear-text methods as well as privacy-preserving (PPRL) methods were compared. RESULTS: The results indicate linkage bias if only the primary identifiers are available in the register. More detailed identifiers, including place of birth, are required to minimize linkage bias. The amount of information available to identify a person for matching is more critical for linkage quality than the record linkage method applied. Differences in linkage quality between the best procedures (probabilistic linkage and multiple matchkeys) are minor. CONCLUSIONS: Microsimulation is a valuable tool for designing record linkage procedures. By modelling the processes resulting in changes or errors in quasi-identifiers, predicting data quality to be expected after the implementation of a register seems possible. |
format | Online Article Text |
id | pubmed-10463005 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Swansea University |
record_format | MEDLINE/PubMed |
spelling | pubmed-104630052023-08-30 Microsimulation of an educational attainment register to predict future record linkage quality Schnell, Rainer Weiand, Severin Int J Popul Data Sci Population Data Science INTRODUCTION: Population wide educational attainment registers are necessary for educational planning and research. Regular linking of databases is needed to build and update such a register. Without availability of unique national identification numbers, record linkage must be based on quasi-identifiers such as name, date of birth and sex. However, the data protection principle of data minimization aims to minimize the set of identifiers in databases. OBJECTIVES: Therefore, the German Federal Ministry of Research and Education commissioned a study to inform legislation on the minimum set of identifiers required for a national educational register. METHODS: To justify our recommendations empirically, we implemented a microsimulation of about 20 million people. The simulated register accumulates changes and errors in identifiers due to migration, regional mobility, marriage, school career and mortality, thereby allowing the study of errors on longitudinal datasets. Updated records were linked yearly to the simulated register using several linkage methods. Clear-text methods as well as privacy-preserving (PPRL) methods were compared. RESULTS: The results indicate linkage bias if only the primary identifiers are available in the register. More detailed identifiers, including place of birth, are required to minimize linkage bias. The amount of information available to identify a person for matching is more critical for linkage quality than the record linkage method applied. Differences in linkage quality between the best procedures (probabilistic linkage and multiple matchkeys) are minor. CONCLUSIONS: Microsimulation is a valuable tool for designing record linkage procedures. By modelling the processes resulting in changes or errors in quasi-identifiers, predicting data quality to be expected after the implementation of a register seems possible. Swansea University 2023-04-03 /pmc/articles/PMC10463005/ /pubmed/37649490 http://dx.doi.org/10.23889/ijpds.v8i1.2122 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. |
spellingShingle | Population Data Science Schnell, Rainer Weiand, Severin Microsimulation of an educational attainment register to predict future record linkage quality |
title | Microsimulation of an educational attainment register to predict future record linkage quality |
title_full | Microsimulation of an educational attainment register to predict future record linkage quality |
title_fullStr | Microsimulation of an educational attainment register to predict future record linkage quality |
title_full_unstemmed | Microsimulation of an educational attainment register to predict future record linkage quality |
title_short | Microsimulation of an educational attainment register to predict future record linkage quality |
title_sort | microsimulation of an educational attainment register to predict future record linkage quality |
topic | Population Data Science |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10463005/ https://www.ncbi.nlm.nih.gov/pubmed/37649490 http://dx.doi.org/10.23889/ijpds.v8i1.2122 |
work_keys_str_mv | AT schnellrainer microsimulationofaneducationalattainmentregistertopredictfuturerecordlinkagequality AT weiandseverin microsimulationofaneducationalattainmentregistertopredictfuturerecordlinkagequality |