Cargando…
Implementation and validation of a probabilistic linkage method for population databases without identification variables
Linking records of the same person from different sources makes it possible to build administrative cohorts and perform longitudinal analyzes, as an alternative to traditional cohort studies, and have important practical implications in producing knowledge in public health. We implemented the Felleg...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9793263/ https://www.ncbi.nlm.nih.gov/pubmed/36582715 http://dx.doi.org/10.1016/j.heliyon.2022.e12311 |
_version_ | 1784859817905487872 |
---|---|
author | Quezada-Sánchez, Amado D. Espín-Arellano, Iván Morales-Carmona, Evangelina Molina-Vélez, Diana Palacio-Mejía, Lina Sofía González-González, Edgar Leonel Alvarez Aceves, Mariana Hernández-Ávila, Juan Eugenio |
author_facet | Quezada-Sánchez, Amado D. Espín-Arellano, Iván Morales-Carmona, Evangelina Molina-Vélez, Diana Palacio-Mejía, Lina Sofía González-González, Edgar Leonel Alvarez Aceves, Mariana Hernández-Ávila, Juan Eugenio |
author_sort | Quezada-Sánchez, Amado D. |
collection | PubMed |
description | Linking records of the same person from different sources makes it possible to build administrative cohorts and perform longitudinal analyzes, as an alternative to traditional cohort studies, and have important practical implications in producing knowledge in public health. We implemented the Fellegi-Sunter probabilistic linkage method to a sample of records from the Mexican Automated System for Hospital Discharges and the Statistical and Epidemiological System for Deaths and evaluated its performance. The records in each source were randomly divided into a training sample (25%) and a validation sample (75%). We evaluated different types of blocking in terms of complexity reduction and pairs completeness, and record linkage in terms of sensitivity and positive predictive value. In the validation sample, a blocking scheme based on trigrams of the full name achieved 95.76% pairs completeness and 99.9996% complexity reduction. After pairs classification, we achieved a sensitivity of 90.72% and a positive predictive value of 97.10% in the validation sample. Both values were about one percentage point higher than that obtained in the automatic classification without clerical review of potential pairs. We concluded that the linkage algorithm achieved a good performance in terms of sensitivity and positive predictive value and can be used to build administrative cohorts for the epidemiological analysis of populations with records in health information systems. |
format | Online Article Text |
id | pubmed-9793263 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-97932632022-12-28 Implementation and validation of a probabilistic linkage method for population databases without identification variables Quezada-Sánchez, Amado D. Espín-Arellano, Iván Morales-Carmona, Evangelina Molina-Vélez, Diana Palacio-Mejía, Lina Sofía González-González, Edgar Leonel Alvarez Aceves, Mariana Hernández-Ávila, Juan Eugenio Heliyon Research Article Linking records of the same person from different sources makes it possible to build administrative cohorts and perform longitudinal analyzes, as an alternative to traditional cohort studies, and have important practical implications in producing knowledge in public health. We implemented the Fellegi-Sunter probabilistic linkage method to a sample of records from the Mexican Automated System for Hospital Discharges and the Statistical and Epidemiological System for Deaths and evaluated its performance. The records in each source were randomly divided into a training sample (25%) and a validation sample (75%). We evaluated different types of blocking in terms of complexity reduction and pairs completeness, and record linkage in terms of sensitivity and positive predictive value. In the validation sample, a blocking scheme based on trigrams of the full name achieved 95.76% pairs completeness and 99.9996% complexity reduction. After pairs classification, we achieved a sensitivity of 90.72% and a positive predictive value of 97.10% in the validation sample. Both values were about one percentage point higher than that obtained in the automatic classification without clerical review of potential pairs. We concluded that the linkage algorithm achieved a good performance in terms of sensitivity and positive predictive value and can be used to build administrative cohorts for the epidemiological analysis of populations with records in health information systems. Elsevier 2022-12-14 /pmc/articles/PMC9793263/ /pubmed/36582715 http://dx.doi.org/10.1016/j.heliyon.2022.e12311 Text en © 2022 Published by Elsevier Ltd. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Research Article Quezada-Sánchez, Amado D. Espín-Arellano, Iván Morales-Carmona, Evangelina Molina-Vélez, Diana Palacio-Mejía, Lina Sofía González-González, Edgar Leonel Alvarez Aceves, Mariana Hernández-Ávila, Juan Eugenio Implementation and validation of a probabilistic linkage method for population databases without identification variables |
title | Implementation and validation of a probabilistic linkage method for population databases without identification variables |
title_full | Implementation and validation of a probabilistic linkage method for population databases without identification variables |
title_fullStr | Implementation and validation of a probabilistic linkage method for population databases without identification variables |
title_full_unstemmed | Implementation and validation of a probabilistic linkage method for population databases without identification variables |
title_short | Implementation and validation of a probabilistic linkage method for population databases without identification variables |
title_sort | implementation and validation of a probabilistic linkage method for population databases without identification variables |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9793263/ https://www.ncbi.nlm.nih.gov/pubmed/36582715 http://dx.doi.org/10.1016/j.heliyon.2022.e12311 |
work_keys_str_mv | AT quezadasanchezamadod implementationandvalidationofaprobabilisticlinkagemethodforpopulationdatabaseswithoutidentificationvariables AT espinarellanoivan implementationandvalidationofaprobabilisticlinkagemethodforpopulationdatabaseswithoutidentificationvariables AT moralescarmonaevangelina implementationandvalidationofaprobabilisticlinkagemethodforpopulationdatabaseswithoutidentificationvariables AT molinavelezdiana implementationandvalidationofaprobabilisticlinkagemethodforpopulationdatabaseswithoutidentificationvariables AT palaciomejialinasofia implementationandvalidationofaprobabilisticlinkagemethodforpopulationdatabaseswithoutidentificationvariables AT gonzalezgonzalezedgarleonel implementationandvalidationofaprobabilisticlinkagemethodforpopulationdatabaseswithoutidentificationvariables AT alvarezacevesmariana implementationandvalidationofaprobabilisticlinkagemethodforpopulationdatabaseswithoutidentificationvariables AT hernandezavilajuaneugenio implementationandvalidationofaprobabilisticlinkagemethodforpopulationdatabaseswithoutidentificationvariables |