Cargando…

Where No Universal Health Care Identifier Exists: Comparison and Determination of the Utility of Score-Based Persons Matching Algorithms Using Demographic Data

BACKGROUND: A universal health care identifier (UHID) facilitates the development of longitudinal medical records in health care settings where follow up and tracking of persons across health care sectors are needed. HIV case-based surveillance (CBS) entails longitudinal follow up of HIV cases from...

Descripción completa

Detalles Bibliográficos
Autores principales: Waruru, Anthony, Natukunda, Agnes, Nyagah, Lilly M, Kellogg, Timothy A, Zielinski-Gutierrez, Emily, Waruiru, Wanjiru, Masamaro, Kenneth, Harklerode, Richelle, Odhiambo, Jacob, Manders, Eric-Jan, Young, Peter W
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6315226/
https://www.ncbi.nlm.nih.gov/pubmed/30545805
http://dx.doi.org/10.2196/10436
_version_ 1783384243291815936
author Waruru, Anthony
Natukunda, Agnes
Nyagah, Lilly M
Kellogg, Timothy A
Zielinski-Gutierrez, Emily
Waruiru, Wanjiru
Masamaro, Kenneth
Harklerode, Richelle
Odhiambo, Jacob
Manders, Eric-Jan
Young, Peter W
author_facet Waruru, Anthony
Natukunda, Agnes
Nyagah, Lilly M
Kellogg, Timothy A
Zielinski-Gutierrez, Emily
Waruiru, Wanjiru
Masamaro, Kenneth
Harklerode, Richelle
Odhiambo, Jacob
Manders, Eric-Jan
Young, Peter W
author_sort Waruru, Anthony
collection PubMed
description BACKGROUND: A universal health care identifier (UHID) facilitates the development of longitudinal medical records in health care settings where follow up and tracking of persons across health care sectors are needed. HIV case-based surveillance (CBS) entails longitudinal follow up of HIV cases from diagnosis, linkage to care and treatment, and is recommended for second generation HIV surveillance. In the absence of a UHID, records matching, linking, and deduplication may be done using score-based persons matching algorithms. We present a stepwise process of score-based persons matching algorithms based on demographic data to improve HIV CBS and other longitudinal data systems. OBJECTIVE: The aim of this study is to compare deterministic and score-based persons matching algorithms in records linkage and matching using demographic data in settings without a UHID. METHODS: We used HIV CBS pilot data from 124 facilities in 2 high HIV-burden counties (Siaya and Kisumu) in western Kenya. For efficient processing, data were grouped into 3 scenarios within (1) HIV testing services (HTS), (2) HTS-care, and (3) within care. In deterministic matching, we directly compared identifiers and pseudo-identifiers from medical records to determine matches. We used R stringdist package for Jaro, Jaro-Winkler score-based matching and Levenshtein, and Damerau-Levenshtein string edit distance calculation methods. For the Jaro-Winkler method, we used a penalty (р)=0.1 and applied 4 weights (ω) to Levenshtein and Damerau-Levenshtein: deletion ω=0.8, insertion ω=0.8, substitutions ω=1, and transposition ω=0.5. RESULTS: We abstracted 12,157 cases of which 4073/12,157 (33.5%) were from HTS, 1091/12,157 (9.0%) from HTS-care, and 6993/12,157 (57.5%) within care. Using the deterministic process 435/12,157 (3.6%) duplicate records were identified, yielding 96.4% (11,722/12,157) unique cases. Overall, of the score-based methods, Jaro-Winkler yielded the most duplicate records (686/12,157, 5.6%) while Jaro yielded the least duplicates (546/12,157, 4.5%), and Levenshtein and Damerau-Levenshtein yielded 4.6% (563/12,157) duplicates. Specifically, duplicate records yielded by method were: (1) Jaro 5.7% (234/4073) within HTS, 0.4% (4/1091) in HTS-care, and 4.4% (308/6993) within care, (2) Jaro-Winkler 7.4% (302/4073) within HTS, 0.5% (6/1091) in HTS-care, and 5.4% (378/6993) within care, (3) Levenshtein 6.4% (262/4073) within HTS, 0.4% (4/1091) in HTS-care, and 4.2% (297/6993) within care, and (4) Damerau-Levenshtein 6.4% (262/4073) within HTS, 0.4% (4/1091) in HTS-care, and 4.2% (297/6993) within care. CONCLUSIONS: Without deduplication, over reporting occurs across the care and treatment cascade. Jaro-Winkler score-based matching performed the best in identifying matches. A pragmatic estimate of duplicates in health care settings can provide a corrective factor for modeled estimates, for targeting and program planning. We propose that even without a UHID, standard national deduplication and persons-matching algorithm that utilizes demographic data would improve accuracy in monitoring HIV care clinical cascades.
format Online
Article
Text
id pubmed-6315226
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-63152262019-01-18 Where No Universal Health Care Identifier Exists: Comparison and Determination of the Utility of Score-Based Persons Matching Algorithms Using Demographic Data Waruru, Anthony Natukunda, Agnes Nyagah, Lilly M Kellogg, Timothy A Zielinski-Gutierrez, Emily Waruiru, Wanjiru Masamaro, Kenneth Harklerode, Richelle Odhiambo, Jacob Manders, Eric-Jan Young, Peter W JMIR Public Health Surveill Original Paper BACKGROUND: A universal health care identifier (UHID) facilitates the development of longitudinal medical records in health care settings where follow up and tracking of persons across health care sectors are needed. HIV case-based surveillance (CBS) entails longitudinal follow up of HIV cases from diagnosis, linkage to care and treatment, and is recommended for second generation HIV surveillance. In the absence of a UHID, records matching, linking, and deduplication may be done using score-based persons matching algorithms. We present a stepwise process of score-based persons matching algorithms based on demographic data to improve HIV CBS and other longitudinal data systems. OBJECTIVE: The aim of this study is to compare deterministic and score-based persons matching algorithms in records linkage and matching using demographic data in settings without a UHID. METHODS: We used HIV CBS pilot data from 124 facilities in 2 high HIV-burden counties (Siaya and Kisumu) in western Kenya. For efficient processing, data were grouped into 3 scenarios within (1) HIV testing services (HTS), (2) HTS-care, and (3) within care. In deterministic matching, we directly compared identifiers and pseudo-identifiers from medical records to determine matches. We used R stringdist package for Jaro, Jaro-Winkler score-based matching and Levenshtein, and Damerau-Levenshtein string edit distance calculation methods. For the Jaro-Winkler method, we used a penalty (р)=0.1 and applied 4 weights (ω) to Levenshtein and Damerau-Levenshtein: deletion ω=0.8, insertion ω=0.8, substitutions ω=1, and transposition ω=0.5. RESULTS: We abstracted 12,157 cases of which 4073/12,157 (33.5%) were from HTS, 1091/12,157 (9.0%) from HTS-care, and 6993/12,157 (57.5%) within care. Using the deterministic process 435/12,157 (3.6%) duplicate records were identified, yielding 96.4% (11,722/12,157) unique cases. Overall, of the score-based methods, Jaro-Winkler yielded the most duplicate records (686/12,157, 5.6%) while Jaro yielded the least duplicates (546/12,157, 4.5%), and Levenshtein and Damerau-Levenshtein yielded 4.6% (563/12,157) duplicates. Specifically, duplicate records yielded by method were: (1) Jaro 5.7% (234/4073) within HTS, 0.4% (4/1091) in HTS-care, and 4.4% (308/6993) within care, (2) Jaro-Winkler 7.4% (302/4073) within HTS, 0.5% (6/1091) in HTS-care, and 5.4% (378/6993) within care, (3) Levenshtein 6.4% (262/4073) within HTS, 0.4% (4/1091) in HTS-care, and 4.2% (297/6993) within care, and (4) Damerau-Levenshtein 6.4% (262/4073) within HTS, 0.4% (4/1091) in HTS-care, and 4.2% (297/6993) within care. CONCLUSIONS: Without deduplication, over reporting occurs across the care and treatment cascade. Jaro-Winkler score-based matching performed the best in identifying matches. A pragmatic estimate of duplicates in health care settings can provide a corrective factor for modeled estimates, for targeting and program planning. We propose that even without a UHID, standard national deduplication and persons-matching algorithm that utilizes demographic data would improve accuracy in monitoring HIV care clinical cascades. JMIR Publications 2018-12-13 /pmc/articles/PMC6315226/ /pubmed/30545805 http://dx.doi.org/10.2196/10436 Text en ©Anthony Waruru, Agnes Natukunda, Lilly M Nyagah, Timothy A Kellogg, Emily Zielinski-Gutierrez, Wanjiru Waruiru, Kenneth Masamaro, Richelle Harklerode, Jacob Odhiambo, Eric-Jan Manders, Peter W Young. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 13.12.2018. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Waruru, Anthony
Natukunda, Agnes
Nyagah, Lilly M
Kellogg, Timothy A
Zielinski-Gutierrez, Emily
Waruiru, Wanjiru
Masamaro, Kenneth
Harklerode, Richelle
Odhiambo, Jacob
Manders, Eric-Jan
Young, Peter W
Where No Universal Health Care Identifier Exists: Comparison and Determination of the Utility of Score-Based Persons Matching Algorithms Using Demographic Data
title Where No Universal Health Care Identifier Exists: Comparison and Determination of the Utility of Score-Based Persons Matching Algorithms Using Demographic Data
title_full Where No Universal Health Care Identifier Exists: Comparison and Determination of the Utility of Score-Based Persons Matching Algorithms Using Demographic Data
title_fullStr Where No Universal Health Care Identifier Exists: Comparison and Determination of the Utility of Score-Based Persons Matching Algorithms Using Demographic Data
title_full_unstemmed Where No Universal Health Care Identifier Exists: Comparison and Determination of the Utility of Score-Based Persons Matching Algorithms Using Demographic Data
title_short Where No Universal Health Care Identifier Exists: Comparison and Determination of the Utility of Score-Based Persons Matching Algorithms Using Demographic Data
title_sort where no universal health care identifier exists: comparison and determination of the utility of score-based persons matching algorithms using demographic data
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6315226/
https://www.ncbi.nlm.nih.gov/pubmed/30545805
http://dx.doi.org/10.2196/10436
work_keys_str_mv AT waruruanthony wherenouniversalhealthcareidentifierexistscomparisonanddeterminationoftheutilityofscorebasedpersonsmatchingalgorithmsusingdemographicdata
AT natukundaagnes wherenouniversalhealthcareidentifierexistscomparisonanddeterminationoftheutilityofscorebasedpersonsmatchingalgorithmsusingdemographicdata
AT nyagahlillym wherenouniversalhealthcareidentifierexistscomparisonanddeterminationoftheutilityofscorebasedpersonsmatchingalgorithmsusingdemographicdata
AT kelloggtimothya wherenouniversalhealthcareidentifierexistscomparisonanddeterminationoftheutilityofscorebasedpersonsmatchingalgorithmsusingdemographicdata
AT zielinskigutierrezemily wherenouniversalhealthcareidentifierexistscomparisonanddeterminationoftheutilityofscorebasedpersonsmatchingalgorithmsusingdemographicdata
AT waruiruwanjiru wherenouniversalhealthcareidentifierexistscomparisonanddeterminationoftheutilityofscorebasedpersonsmatchingalgorithmsusingdemographicdata
AT masamarokenneth wherenouniversalhealthcareidentifierexistscomparisonanddeterminationoftheutilityofscorebasedpersonsmatchingalgorithmsusingdemographicdata
AT harkleroderichelle wherenouniversalhealthcareidentifierexistscomparisonanddeterminationoftheutilityofscorebasedpersonsmatchingalgorithmsusingdemographicdata
AT odhiambojacob wherenouniversalhealthcareidentifierexistscomparisonanddeterminationoftheutilityofscorebasedpersonsmatchingalgorithmsusingdemographicdata
AT mandersericjan wherenouniversalhealthcareidentifierexistscomparisonanddeterminationoftheutilityofscorebasedpersonsmatchingalgorithmsusingdemographicdata
AT youngpeterw wherenouniversalhealthcareidentifierexistscomparisonanddeterminationoftheutilityofscorebasedpersonsmatchingalgorithmsusingdemographicdata