Cargando…
Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil
BACKGROUND: Linking Brazilian databases demands the development of algorithms and processes to deal with various challenges including the large size of the databases, the low number and poor quality of personal identifiers available to be compared (national security number not mandatory), and some c...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8204416/ https://www.ncbi.nlm.nih.gov/pubmed/34130670 http://dx.doi.org/10.1186/s12911-021-01550-6 |
_version_ | 1783708335839641600 |
---|---|
author | Coeli, Claudia Medina Saraceni, Valeria Medeiros, Paulo Mota da Silva Santos, Helena Pereira Guillen, Luis Carlos Torres Alves, Luís Guilherme Santos Buteri Hone, Thomas Millett, Christopher Trajman, Anete Durovni, Betina |
author_facet | Coeli, Claudia Medina Saraceni, Valeria Medeiros, Paulo Mota da Silva Santos, Helena Pereira Guillen, Luis Carlos Torres Alves, Luís Guilherme Santos Buteri Hone, Thomas Millett, Christopher Trajman, Anete Durovni, Betina |
author_sort | Coeli, Claudia Medina |
collection | PubMed |
description | BACKGROUND: Linking Brazilian databases demands the development of algorithms and processes to deal with various challenges including the large size of the databases, the low number and poor quality of personal identifiers available to be compared (national security number not mandatory), and some characteristics of Brazilian names that make the linkage process prone to errors. This study aims to describe and evaluate the quality of the processes used to create an individual-linked database for data-intensive research on the impacts on health indicators of the expansion of primary care in Rio de Janeiro City, Brazil. METHODS: We created an individual-level dataset linking social benefits recipients, primary health care, hospital admission and mortality data. The databases were pre-processed, and we adopted a multiple approach strategy combining deterministic and probabilistic record linkage techniques, and an extensive clerical review of the potential matches. Relying on manual review as the gold standard, we estimated the false match (false-positive) proportion of each approach (deterministic, probabilistic, clerical review) and the missed match proportion (false-negative) of the clerical review approach. To assess the sensitivity (recall) to identifying social benefits recipients’ deaths, we used their vital status registered on the primary care database as the gold standard. RESULTS: In all linkage processes, the deterministic approach identified most of the matches. However, the proportion of matches identified in each approach varied. The false match proportion was around 1% or less in almost all approaches. The missed match proportion in the clerical review approach of all linkage processes were under 3%. We estimated a recall of 93.6% (95% CI 92.8–94.3) for the linkage between social benefits recipients and mortality data. CONCLUSION: The adoption of a linkage strategy combining pre-processing routines, deterministic, and probabilistic strategies, as well as an extensive clerical review approach minimized linkage errors in the context of suboptimal data quality. |
format | Online Article Text |
id | pubmed-8204416 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-82044162021-06-16 Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil Coeli, Claudia Medina Saraceni, Valeria Medeiros, Paulo Mota da Silva Santos, Helena Pereira Guillen, Luis Carlos Torres Alves, Luís Guilherme Santos Buteri Hone, Thomas Millett, Christopher Trajman, Anete Durovni, Betina BMC Med Inform Decis Mak Research Article BACKGROUND: Linking Brazilian databases demands the development of algorithms and processes to deal with various challenges including the large size of the databases, the low number and poor quality of personal identifiers available to be compared (national security number not mandatory), and some characteristics of Brazilian names that make the linkage process prone to errors. This study aims to describe and evaluate the quality of the processes used to create an individual-linked database for data-intensive research on the impacts on health indicators of the expansion of primary care in Rio de Janeiro City, Brazil. METHODS: We created an individual-level dataset linking social benefits recipients, primary health care, hospital admission and mortality data. The databases were pre-processed, and we adopted a multiple approach strategy combining deterministic and probabilistic record linkage techniques, and an extensive clerical review of the potential matches. Relying on manual review as the gold standard, we estimated the false match (false-positive) proportion of each approach (deterministic, probabilistic, clerical review) and the missed match proportion (false-negative) of the clerical review approach. To assess the sensitivity (recall) to identifying social benefits recipients’ deaths, we used their vital status registered on the primary care database as the gold standard. RESULTS: In all linkage processes, the deterministic approach identified most of the matches. However, the proportion of matches identified in each approach varied. The false match proportion was around 1% or less in almost all approaches. The missed match proportion in the clerical review approach of all linkage processes were under 3%. We estimated a recall of 93.6% (95% CI 92.8–94.3) for the linkage between social benefits recipients and mortality data. CONCLUSION: The adoption of a linkage strategy combining pre-processing routines, deterministic, and probabilistic strategies, as well as an extensive clerical review approach minimized linkage errors in the context of suboptimal data quality. BioMed Central 2021-06-15 /pmc/articles/PMC8204416/ /pubmed/34130670 http://dx.doi.org/10.1186/s12911-021-01550-6 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Coeli, Claudia Medina Saraceni, Valeria Medeiros, Paulo Mota da Silva Santos, Helena Pereira Guillen, Luis Carlos Torres Alves, Luís Guilherme Santos Buteri Hone, Thomas Millett, Christopher Trajman, Anete Durovni, Betina Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil |
title | Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil |
title_full | Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil |
title_fullStr | Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil |
title_full_unstemmed | Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil |
title_short | Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil |
title_sort | record linkage under suboptimal conditions for data-intensive evaluation of primary care in rio de janeiro, brazil |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8204416/ https://www.ncbi.nlm.nih.gov/pubmed/34130670 http://dx.doi.org/10.1186/s12911-021-01550-6 |
work_keys_str_mv | AT coeliclaudiamedina recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil AT saracenivaleria recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil AT medeirospaulomota recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil AT dasilvasantoshelenapereira recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil AT guillenluiscarlostorres recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil AT alvesluisguilhermesantosbuteri recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil AT honethomas recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil AT millettchristopher recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil AT trajmananete recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil AT durovnibetina recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil |