Cargando…

Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study

BACKGROUND: In Australia, many community service program data collections developed over the last decade, including several for aged care programs, contain a statistical linkage key (SLK) to enable derivation of client-level data. In addition, a common SLK is now used in many collections to facilita...

Descripción completa

Detalles Bibliográficos
Autores principales: Karmel, Rosemary, Anderson, Phil, Gibson, Diane, Peut, Ann, Duckett, Stephen, Wells, Yvonne
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2842267/
https://www.ncbi.nlm.nih.gov/pubmed/20167118
http://dx.doi.org/10.1186/1472-6963-10-41
_version_ 1782179184980787200
author Karmel, Rosemary
Anderson, Phil
Gibson, Diane
Peut, Ann
Duckett, Stephen
Wells, Yvonne
author_facet Karmel, Rosemary
Anderson, Phil
Gibson, Diane
Peut, Ann
Duckett, Stephen
Wells, Yvonne
author_sort Karmel, Rosemary
collection PubMed
description BACKGROUND: In Australia, many community service program data collections developed over the last decade, including several for aged care programs, contain a statistical linkage key (SLK) to enable derivation of client-level data. In addition, a common SLK is now used in many collections to facilitate the statistical examination of cross-program use. In 2005, the Pathways in Aged Care (PIAC) cohort study was funded to create a linked aged care database using the common SLK to enable analysis of pathways through aged care services. Linkage using an SLK is commonly deterministic. The purpose of this paper is to describe an extended deterministic record linkage strategy for situations where there is a general person identifier (e.g. an SLK) and several additional variables suitable for data linkage. This approach can allow for variation in client information recorded on different databases. METHODS: A stepwise deterministic record linkage algorithm was developed to link datasets using an SLK and several other variables. Three measures of likely match accuracy were used: the discriminating power of match key values, an estimated false match rate, and an estimated step-specific trade-off between true and false matches. The method was validated through examining link properties and clerical review of three samples of links. RESULTS: The deterministic algorithm resulted in up to an 11% increase in links compared with simple deterministic matching using an SLK. The links identified are of high quality: validation samples showed that less than 0.5% of links were false positives, and very few matches were made using non-unique match information (0.01%). There was a high degree of consistency in the characteristics of linked events. CONCLUSIONS: The linkage strategy described in this paper has allowed the linking of multiple large aged care service datasets using a statistical linkage key while allowing for variation in its reporting. More widely, our deterministic algorithm, based on statistical properties of match keys, is a useful addition to the linker's toolkit. In particular, it may prove attractive when insufficient data are available for clerical review or follow-up, and the researcher has fewer options in relation to probabilistic linkage.
format Text
id pubmed-2842267
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28422672010-03-20 Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study Karmel, Rosemary Anderson, Phil Gibson, Diane Peut, Ann Duckett, Stephen Wells, Yvonne BMC Health Serv Res Research article BACKGROUND: In Australia, many community service program data collections developed over the last decade, including several for aged care programs, contain a statistical linkage key (SLK) to enable derivation of client-level data. In addition, a common SLK is now used in many collections to facilitate the statistical examination of cross-program use. In 2005, the Pathways in Aged Care (PIAC) cohort study was funded to create a linked aged care database using the common SLK to enable analysis of pathways through aged care services. Linkage using an SLK is commonly deterministic. The purpose of this paper is to describe an extended deterministic record linkage strategy for situations where there is a general person identifier (e.g. an SLK) and several additional variables suitable for data linkage. This approach can allow for variation in client information recorded on different databases. METHODS: A stepwise deterministic record linkage algorithm was developed to link datasets using an SLK and several other variables. Three measures of likely match accuracy were used: the discriminating power of match key values, an estimated false match rate, and an estimated step-specific trade-off between true and false matches. The method was validated through examining link properties and clerical review of three samples of links. RESULTS: The deterministic algorithm resulted in up to an 11% increase in links compared with simple deterministic matching using an SLK. The links identified are of high quality: validation samples showed that less than 0.5% of links were false positives, and very few matches were made using non-unique match information (0.01%). There was a high degree of consistency in the characteristics of linked events. CONCLUSIONS: The linkage strategy described in this paper has allowed the linking of multiple large aged care service datasets using a statistical linkage key while allowing for variation in its reporting. More widely, our deterministic algorithm, based on statistical properties of match keys, is a useful addition to the linker's toolkit. In particular, it may prove attractive when insufficient data are available for clerical review or follow-up, and the researcher has fewer options in relation to probabilistic linkage. BioMed Central 2010-02-18 /pmc/articles/PMC2842267/ /pubmed/20167118 http://dx.doi.org/10.1186/1472-6963-10-41 Text en Copyright ©2010 Karmel et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
Karmel, Rosemary
Anderson, Phil
Gibson, Diane
Peut, Ann
Duckett, Stephen
Wells, Yvonne
Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study
title Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study
title_full Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study
title_fullStr Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study
title_full_unstemmed Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study
title_short Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study
title_sort empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the piac cohort study
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2842267/
https://www.ncbi.nlm.nih.gov/pubmed/20167118
http://dx.doi.org/10.1186/1472-6963-10-41
work_keys_str_mv AT karmelrosemary empiricalaspectsofrecordlinkageacrossmultipledatasetsusingstatisticallinkagekeystheexperienceofthepiaccohortstudy
AT andersonphil empiricalaspectsofrecordlinkageacrossmultipledatasetsusingstatisticallinkagekeystheexperienceofthepiaccohortstudy
AT gibsondiane empiricalaspectsofrecordlinkageacrossmultipledatasetsusingstatisticallinkagekeystheexperienceofthepiaccohortstudy
AT peutann empiricalaspectsofrecordlinkageacrossmultipledatasetsusingstatisticallinkagekeystheexperienceofthepiaccohortstudy
AT duckettstephen empiricalaspectsofrecordlinkageacrossmultipledatasetsusingstatisticallinkagekeystheexperienceofthepiaccohortstudy
AT wellsyvonne empiricalaspectsofrecordlinkageacrossmultipledatasetsusingstatisticallinkagekeystheexperienceofthepiaccohortstudy