Cargando…

Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil

BACKGROUND: Linking Brazilian databases demands the development of algorithms and processes to deal with various challenges including the large size of the databases, the low number and poor quality of personal identifiers available to be compared (national security number not mandatory), and some c...

Descripción completa

Detalles Bibliográficos
Autores principales: Coeli, Claudia Medina, Saraceni, Valeria, Medeiros, Paulo Mota, da Silva Santos, Helena Pereira, Guillen, Luis Carlos Torres, Alves, Luís Guilherme Santos Buteri, Hone, Thomas, Millett, Christopher, Trajman, Anete, Durovni, Betina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8204416/
https://www.ncbi.nlm.nih.gov/pubmed/34130670
http://dx.doi.org/10.1186/s12911-021-01550-6
_version_ 1783708335839641600
author Coeli, Claudia Medina
Saraceni, Valeria
Medeiros, Paulo Mota
da Silva Santos, Helena Pereira
Guillen, Luis Carlos Torres
Alves, Luís Guilherme Santos Buteri
Hone, Thomas
Millett, Christopher
Trajman, Anete
Durovni, Betina
author_facet Coeli, Claudia Medina
Saraceni, Valeria
Medeiros, Paulo Mota
da Silva Santos, Helena Pereira
Guillen, Luis Carlos Torres
Alves, Luís Guilherme Santos Buteri
Hone, Thomas
Millett, Christopher
Trajman, Anete
Durovni, Betina
author_sort Coeli, Claudia Medina
collection PubMed
description BACKGROUND: Linking Brazilian databases demands the development of algorithms and processes to deal with various challenges including the large size of the databases, the low number and poor quality of personal identifiers available to be compared (national security number not mandatory), and some characteristics of Brazilian names that make the linkage process prone to errors. This study aims to describe and evaluate the quality of the processes used to create an individual-linked database for data-intensive research on the impacts on health indicators of the expansion of primary care in Rio de Janeiro City, Brazil. METHODS: We created an individual-level dataset linking social benefits recipients, primary health care, hospital admission and mortality data. The databases were pre-processed, and we adopted a multiple approach strategy combining deterministic and probabilistic record linkage techniques, and an extensive clerical review of the potential matches. Relying on manual review as the gold standard, we estimated the false match (false-positive) proportion of each approach (deterministic, probabilistic, clerical review) and the missed match proportion (false-negative) of the clerical review approach. To assess the sensitivity (recall) to identifying social benefits recipients’ deaths, we used their vital status registered on the primary care database as the gold standard. RESULTS: In all linkage processes, the deterministic approach identified most of the matches. However, the proportion of matches identified in each approach varied. The false match proportion was around 1% or less in almost all approaches. The missed match proportion in the clerical review approach of all linkage processes were under 3%. We estimated a recall of 93.6% (95% CI 92.8–94.3) for the linkage between social benefits recipients and mortality data. CONCLUSION: The adoption of a linkage strategy combining pre-processing routines, deterministic, and probabilistic strategies, as well as an extensive clerical review approach minimized linkage errors in the context of suboptimal data quality.
format Online
Article
Text
id pubmed-8204416
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-82044162021-06-16 Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil Coeli, Claudia Medina Saraceni, Valeria Medeiros, Paulo Mota da Silva Santos, Helena Pereira Guillen, Luis Carlos Torres Alves, Luís Guilherme Santos Buteri Hone, Thomas Millett, Christopher Trajman, Anete Durovni, Betina BMC Med Inform Decis Mak Research Article BACKGROUND: Linking Brazilian databases demands the development of algorithms and processes to deal with various challenges including the large size of the databases, the low number and poor quality of personal identifiers available to be compared (national security number not mandatory), and some characteristics of Brazilian names that make the linkage process prone to errors. This study aims to describe and evaluate the quality of the processes used to create an individual-linked database for data-intensive research on the impacts on health indicators of the expansion of primary care in Rio de Janeiro City, Brazil. METHODS: We created an individual-level dataset linking social benefits recipients, primary health care, hospital admission and mortality data. The databases were pre-processed, and we adopted a multiple approach strategy combining deterministic and probabilistic record linkage techniques, and an extensive clerical review of the potential matches. Relying on manual review as the gold standard, we estimated the false match (false-positive) proportion of each approach (deterministic, probabilistic, clerical review) and the missed match proportion (false-negative) of the clerical review approach. To assess the sensitivity (recall) to identifying social benefits recipients’ deaths, we used their vital status registered on the primary care database as the gold standard. RESULTS: In all linkage processes, the deterministic approach identified most of the matches. However, the proportion of matches identified in each approach varied. The false match proportion was around 1% or less in almost all approaches. The missed match proportion in the clerical review approach of all linkage processes were under 3%. We estimated a recall of 93.6% (95% CI 92.8–94.3) for the linkage between social benefits recipients and mortality data. CONCLUSION: The adoption of a linkage strategy combining pre-processing routines, deterministic, and probabilistic strategies, as well as an extensive clerical review approach minimized linkage errors in the context of suboptimal data quality. BioMed Central 2021-06-15 /pmc/articles/PMC8204416/ /pubmed/34130670 http://dx.doi.org/10.1186/s12911-021-01550-6 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Coeli, Claudia Medina
Saraceni, Valeria
Medeiros, Paulo Mota
da Silva Santos, Helena Pereira
Guillen, Luis Carlos Torres
Alves, Luís Guilherme Santos Buteri
Hone, Thomas
Millett, Christopher
Trajman, Anete
Durovni, Betina
Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil
title Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil
title_full Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil
title_fullStr Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil
title_full_unstemmed Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil
title_short Record linkage under suboptimal conditions for data-intensive evaluation of primary care in Rio de Janeiro, Brazil
title_sort record linkage under suboptimal conditions for data-intensive evaluation of primary care in rio de janeiro, brazil
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8204416/
https://www.ncbi.nlm.nih.gov/pubmed/34130670
http://dx.doi.org/10.1186/s12911-021-01550-6
work_keys_str_mv AT coeliclaudiamedina recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT saracenivaleria recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT medeirospaulomota recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT dasilvasantoshelenapereira recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT guillenluiscarlostorres recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT alvesluisguilhermesantosbuteri recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT honethomas recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT millettchristopher recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT trajmananete recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil
AT durovnibetina recordlinkageundersuboptimalconditionsfordataintensiveevaluationofprimarycareinriodejaneirobrazil