Cargando…

Implementation and validation of a probabilistic linkage method for population databases without identification variables

Linking records of the same person from different sources makes it possible to build administrative cohorts and perform longitudinal analyzes, as an alternative to traditional cohort studies, and have important practical implications in producing knowledge in public health. We implemented the Felleg...

Descripción completa

Detalles Bibliográficos
Autores principales: Quezada-Sánchez, Amado D., Espín-Arellano, Iván, Morales-Carmona, Evangelina, Molina-Vélez, Diana, Palacio-Mejía, Lina Sofía, González-González, Edgar Leonel, Alvarez Aceves, Mariana, Hernández-Ávila, Juan Eugenio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9793263/
https://www.ncbi.nlm.nih.gov/pubmed/36582715
http://dx.doi.org/10.1016/j.heliyon.2022.e12311
_version_ 1784859817905487872
author Quezada-Sánchez, Amado D.
Espín-Arellano, Iván
Morales-Carmona, Evangelina
Molina-Vélez, Diana
Palacio-Mejía, Lina Sofía
González-González, Edgar Leonel
Alvarez Aceves, Mariana
Hernández-Ávila, Juan Eugenio
author_facet Quezada-Sánchez, Amado D.
Espín-Arellano, Iván
Morales-Carmona, Evangelina
Molina-Vélez, Diana
Palacio-Mejía, Lina Sofía
González-González, Edgar Leonel
Alvarez Aceves, Mariana
Hernández-Ávila, Juan Eugenio
author_sort Quezada-Sánchez, Amado D.
collection PubMed
description Linking records of the same person from different sources makes it possible to build administrative cohorts and perform longitudinal analyzes, as an alternative to traditional cohort studies, and have important practical implications in producing knowledge in public health. We implemented the Fellegi-Sunter probabilistic linkage method to a sample of records from the Mexican Automated System for Hospital Discharges and the Statistical and Epidemiological System for Deaths and evaluated its performance. The records in each source were randomly divided into a training sample (25%) and a validation sample (75%). We evaluated different types of blocking in terms of complexity reduction and pairs completeness, and record linkage in terms of sensitivity and positive predictive value. In the validation sample, a blocking scheme based on trigrams of the full name achieved 95.76% pairs completeness and 99.9996% complexity reduction. After pairs classification, we achieved a sensitivity of 90.72% and a positive predictive value of 97.10% in the validation sample. Both values were about one percentage point higher than that obtained in the automatic classification without clerical review of potential pairs. We concluded that the linkage algorithm achieved a good performance in terms of sensitivity and positive predictive value and can be used to build administrative cohorts for the epidemiological analysis of populations with records in health information systems.
format Online
Article
Text
id pubmed-9793263
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-97932632022-12-28 Implementation and validation of a probabilistic linkage method for population databases without identification variables Quezada-Sánchez, Amado D. Espín-Arellano, Iván Morales-Carmona, Evangelina Molina-Vélez, Diana Palacio-Mejía, Lina Sofía González-González, Edgar Leonel Alvarez Aceves, Mariana Hernández-Ávila, Juan Eugenio Heliyon Research Article Linking records of the same person from different sources makes it possible to build administrative cohorts and perform longitudinal analyzes, as an alternative to traditional cohort studies, and have important practical implications in producing knowledge in public health. We implemented the Fellegi-Sunter probabilistic linkage method to a sample of records from the Mexican Automated System for Hospital Discharges and the Statistical and Epidemiological System for Deaths and evaluated its performance. The records in each source were randomly divided into a training sample (25%) and a validation sample (75%). We evaluated different types of blocking in terms of complexity reduction and pairs completeness, and record linkage in terms of sensitivity and positive predictive value. In the validation sample, a blocking scheme based on trigrams of the full name achieved 95.76% pairs completeness and 99.9996% complexity reduction. After pairs classification, we achieved a sensitivity of 90.72% and a positive predictive value of 97.10% in the validation sample. Both values were about one percentage point higher than that obtained in the automatic classification without clerical review of potential pairs. We concluded that the linkage algorithm achieved a good performance in terms of sensitivity and positive predictive value and can be used to build administrative cohorts for the epidemiological analysis of populations with records in health information systems. Elsevier 2022-12-14 /pmc/articles/PMC9793263/ /pubmed/36582715 http://dx.doi.org/10.1016/j.heliyon.2022.e12311 Text en © 2022 Published by Elsevier Ltd. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Quezada-Sánchez, Amado D.
Espín-Arellano, Iván
Morales-Carmona, Evangelina
Molina-Vélez, Diana
Palacio-Mejía, Lina Sofía
González-González, Edgar Leonel
Alvarez Aceves, Mariana
Hernández-Ávila, Juan Eugenio
Implementation and validation of a probabilistic linkage method for population databases without identification variables
title Implementation and validation of a probabilistic linkage method for population databases without identification variables
title_full Implementation and validation of a probabilistic linkage method for population databases without identification variables
title_fullStr Implementation and validation of a probabilistic linkage method for population databases without identification variables
title_full_unstemmed Implementation and validation of a probabilistic linkage method for population databases without identification variables
title_short Implementation and validation of a probabilistic linkage method for population databases without identification variables
title_sort implementation and validation of a probabilistic linkage method for population databases without identification variables
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9793263/
https://www.ncbi.nlm.nih.gov/pubmed/36582715
http://dx.doi.org/10.1016/j.heliyon.2022.e12311
work_keys_str_mv AT quezadasanchezamadod implementationandvalidationofaprobabilisticlinkagemethodforpopulationdatabaseswithoutidentificationvariables
AT espinarellanoivan implementationandvalidationofaprobabilisticlinkagemethodforpopulationdatabaseswithoutidentificationvariables
AT moralescarmonaevangelina implementationandvalidationofaprobabilisticlinkagemethodforpopulationdatabaseswithoutidentificationvariables
AT molinavelezdiana implementationandvalidationofaprobabilisticlinkagemethodforpopulationdatabaseswithoutidentificationvariables
AT palaciomejialinasofia implementationandvalidationofaprobabilisticlinkagemethodforpopulationdatabaseswithoutidentificationvariables
AT gonzalezgonzalezedgarleonel implementationandvalidationofaprobabilisticlinkagemethodforpopulationdatabaseswithoutidentificationvariables
AT alvarezacevesmariana implementationandvalidationofaprobabilisticlinkagemethodforpopulationdatabaseswithoutidentificationvariables
AT hernandezavilajuaneugenio implementationandvalidationofaprobabilisticlinkagemethodforpopulationdatabaseswithoutidentificationvariables