Cargando…

Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania

BACKGROUND: Studies based on high-quality linked data in developed countries show that even minor linkage errors, which occur when records of two different individuals are erroneously linked or when records belonging to the same individual are not linked, can impact bias and precision of subsequent...

Descripción completa

Detalles Bibliográficos
Autores principales: Rentsch, Christopher T., Harron, Katie, Urassa, Mark, Todd, Jim, Reniers, Georges, Zaba, Basia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6288858/
https://www.ncbi.nlm.nih.gov/pubmed/30526518
http://dx.doi.org/10.1186/s12874-018-0632-5
_version_ 1783379871744917504
author Rentsch, Christopher T.
Harron, Katie
Urassa, Mark
Todd, Jim
Reniers, Georges
Zaba, Basia
author_facet Rentsch, Christopher T.
Harron, Katie
Urassa, Mark
Todd, Jim
Reniers, Georges
Zaba, Basia
author_sort Rentsch, Christopher T.
collection PubMed
description BACKGROUND: Studies based on high-quality linked data in developed countries show that even minor linkage errors, which occur when records of two different individuals are erroneously linked or when records belonging to the same individual are not linked, can impact bias and precision of subsequent analyses. We evaluated the impact of linkage quality on inferences drawn from analyses using data with substantial linkage errors in rural Tanzania. METHODS: Semi-automatic point-of-contact interactive record linkage was used to establish gold standard links between community-based HIV surveillance data and medical records at clinics serving the surveillance population. Automated probabilistic record linkage was used to create analytic datasets at minimum, low, medium, and high match score thresholds. Cox proportional hazards regression models were used to compare HIV care registration rates by testing modality (sero-survey vs. clinic) in each analytic dataset. We assessed linkage quality using three approaches: quantifying linkage errors, comparing characteristics between linked and unlinked data, and evaluating bias and precision of regression estimates. RESULTS: Between 2014 and 2017, 405 individuals with gold standard links were newly diagnosed with HIV in sero-surveys (n = 263) and clinics (n = 142). Automated probabilistic linkage correctly identified 233 individuals (positive predictive value [PPV] = 65%) at the low threshold and 95 individuals (PPV = 90%) at the high threshold. Significant differences were found between linked and unlinked records in primary exposure and outcome variables and for adjusting covariates at every threshold. As expected, differences attenuated with increasing threshold. Testing modality was significantly associated with time to registration in the gold standard data (adjusted hazard ratio [HR] 4.98 for clinic-based testing, 95% confidence interval [CI] 3.34, 7.42). Increasing false matches weakened the association (HR 2.76 at minimum match score threshold, 95% CI 1.73, 4.41). Increasing missed matches (i.e., increasing match score threshold and positive predictive value of the linkage algorithm) was strongly correlated with a reduction in the precision of coefficient estimate (R(2) = 0.97; p = 0.03). CONCLUSIONS: Similar to studies with more negligible levels of linkage errors, false matches in this setting reduced the magnitude of the association; missed matches reduced precision. Adjusting for these biases could provide more robust results using data with considerable linkage errors.
format Online
Article
Text
id pubmed-6288858
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62888582018-12-14 Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania Rentsch, Christopher T. Harron, Katie Urassa, Mark Todd, Jim Reniers, Georges Zaba, Basia BMC Med Res Methodol Research Article BACKGROUND: Studies based on high-quality linked data in developed countries show that even minor linkage errors, which occur when records of two different individuals are erroneously linked or when records belonging to the same individual are not linked, can impact bias and precision of subsequent analyses. We evaluated the impact of linkage quality on inferences drawn from analyses using data with substantial linkage errors in rural Tanzania. METHODS: Semi-automatic point-of-contact interactive record linkage was used to establish gold standard links between community-based HIV surveillance data and medical records at clinics serving the surveillance population. Automated probabilistic record linkage was used to create analytic datasets at minimum, low, medium, and high match score thresholds. Cox proportional hazards regression models were used to compare HIV care registration rates by testing modality (sero-survey vs. clinic) in each analytic dataset. We assessed linkage quality using three approaches: quantifying linkage errors, comparing characteristics between linked and unlinked data, and evaluating bias and precision of regression estimates. RESULTS: Between 2014 and 2017, 405 individuals with gold standard links were newly diagnosed with HIV in sero-surveys (n = 263) and clinics (n = 142). Automated probabilistic linkage correctly identified 233 individuals (positive predictive value [PPV] = 65%) at the low threshold and 95 individuals (PPV = 90%) at the high threshold. Significant differences were found between linked and unlinked records in primary exposure and outcome variables and for adjusting covariates at every threshold. As expected, differences attenuated with increasing threshold. Testing modality was significantly associated with time to registration in the gold standard data (adjusted hazard ratio [HR] 4.98 for clinic-based testing, 95% confidence interval [CI] 3.34, 7.42). Increasing false matches weakened the association (HR 2.76 at minimum match score threshold, 95% CI 1.73, 4.41). Increasing missed matches (i.e., increasing match score threshold and positive predictive value of the linkage algorithm) was strongly correlated with a reduction in the precision of coefficient estimate (R(2) = 0.97; p = 0.03). CONCLUSIONS: Similar to studies with more negligible levels of linkage errors, false matches in this setting reduced the magnitude of the association; missed matches reduced precision. Adjusting for these biases could provide more robust results using data with considerable linkage errors. BioMed Central 2018-12-10 /pmc/articles/PMC6288858/ /pubmed/30526518 http://dx.doi.org/10.1186/s12874-018-0632-5 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Rentsch, Christopher T.
Harron, Katie
Urassa, Mark
Todd, Jim
Reniers, Georges
Zaba, Basia
Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania
title Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania
title_full Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania
title_fullStr Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania
title_full_unstemmed Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania
title_short Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania
title_sort impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural tanzania
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6288858/
https://www.ncbi.nlm.nih.gov/pubmed/30526518
http://dx.doi.org/10.1186/s12874-018-0632-5
work_keys_str_mv AT rentschchristophert impactoflinkagequalityoninferencesdrawnfromanalysesusingdatawithhighratesoflinkageerrorsinruraltanzania
AT harronkatie impactoflinkagequalityoninferencesdrawnfromanalysesusingdatawithhighratesoflinkageerrorsinruraltanzania
AT urassamark impactoflinkagequalityoninferencesdrawnfromanalysesusingdatawithhighratesoflinkageerrorsinruraltanzania
AT toddjim impactoflinkagequalityoninferencesdrawnfromanalysesusingdatawithhighratesoflinkageerrorsinruraltanzania
AT reniersgeorges impactoflinkagequalityoninferencesdrawnfromanalysesusingdatawithhighratesoflinkageerrorsinruraltanzania
AT zababasia impactoflinkagequalityoninferencesdrawnfromanalysesusingdatawithhighratesoflinkageerrorsinruraltanzania