Cargando…

A process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records

INTRODUCTION: Learning health systems can help estimate chronic disease prevalence through distributed data networks (DDNs). Concerns remain about bias introduced to DDN prevalence estimates when individuals seeking care across systems are counted multiple times. This paper describes a process to de...

Descripción completa

Detalles Bibliográficos
Autores principales: Scott, Kenneth A., Davies, Sara Deakyne, Zucker, Rachel, Ong, Toan, Kraus, Emily McCormick, Kahn, Michael G, Bondy, Jessica, Daley, Matt F., Horle, Kate, Bacon, Emily, Schilling, Lisa, Crume, Tessa, Hasnain‐Wynia, Romana, Foldy, Seth, Budney, Gregory, Davidson, Arthur J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9284932/
https://www.ncbi.nlm.nih.gov/pubmed/35860322
http://dx.doi.org/10.1002/lrh2.10297
_version_ 1784747671639031808
author Scott, Kenneth A.
Davies, Sara Deakyne
Zucker, Rachel
Ong, Toan
Kraus, Emily McCormick
Kahn, Michael G
Bondy, Jessica
Daley, Matt F.
Horle, Kate
Bacon, Emily
Schilling, Lisa
Crume, Tessa
Hasnain‐Wynia, Romana
Foldy, Seth
Budney, Gregory
Davidson, Arthur J.
author_facet Scott, Kenneth A.
Davies, Sara Deakyne
Zucker, Rachel
Ong, Toan
Kraus, Emily McCormick
Kahn, Michael G
Bondy, Jessica
Daley, Matt F.
Horle, Kate
Bacon, Emily
Schilling, Lisa
Crume, Tessa
Hasnain‐Wynia, Romana
Foldy, Seth
Budney, Gregory
Davidson, Arthur J.
author_sort Scott, Kenneth A.
collection PubMed
description INTRODUCTION: Learning health systems can help estimate chronic disease prevalence through distributed data networks (DDNs). Concerns remain about bias introduced to DDN prevalence estimates when individuals seeking care across systems are counted multiple times. This paper describes a process to deduplicate individuals for DDN prevalence estimates. METHODS: We operationalized a two‐step deduplication process, leveraging health information exchange (HIE)‐assigned network identifiers, within the Colorado Health Observation Regional Data Service (CHORDS) DDN. We generated prevalence estimates for type 1 and type 2 diabetes among pediatric patients (0‐17 years) with at least one 2017 encounter in one of two geographically‐proximate DDN partners. We assessed the extent of cross‐system duplication and its effect on prevalence estimates. RESULTS: We identified 218 437 unique pediatric patients seen across systems during 2017, including 7628 (3.5%) seen in both. We found no measurable difference in prevalence after deduplication. The number of cases we identified differed slightly by data reconciliation strategy. Concordance of linked patients' demographic attributes varied by attribute. CONCLUSIONS: We implemented an HIE‐dependent, extensible process that deduplicates individuals for less biased prevalence estimates in a DDN. Our null pilot findings have limited generalizability. Overlap was small and likely insufficient to influence prevalence estimates. Other factors, including the number and size of partners, the matching algorithm, and the electronic phenotype may influence the degree of deduplication bias. Additional use cases may help improve understanding of duplication bias and reveal other principles and insights. This study informed how DDNs could support learning health systems' response to public health challenges and improve regional health.
format Online
Article
Text
id pubmed-9284932
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-92849322022-07-19 A process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records Scott, Kenneth A. Davies, Sara Deakyne Zucker, Rachel Ong, Toan Kraus, Emily McCormick Kahn, Michael G Bondy, Jessica Daley, Matt F. Horle, Kate Bacon, Emily Schilling, Lisa Crume, Tessa Hasnain‐Wynia, Romana Foldy, Seth Budney, Gregory Davidson, Arthur J. Learn Health Syst Research Reports INTRODUCTION: Learning health systems can help estimate chronic disease prevalence through distributed data networks (DDNs). Concerns remain about bias introduced to DDN prevalence estimates when individuals seeking care across systems are counted multiple times. This paper describes a process to deduplicate individuals for DDN prevalence estimates. METHODS: We operationalized a two‐step deduplication process, leveraging health information exchange (HIE)‐assigned network identifiers, within the Colorado Health Observation Regional Data Service (CHORDS) DDN. We generated prevalence estimates for type 1 and type 2 diabetes among pediatric patients (0‐17 years) with at least one 2017 encounter in one of two geographically‐proximate DDN partners. We assessed the extent of cross‐system duplication and its effect on prevalence estimates. RESULTS: We identified 218 437 unique pediatric patients seen across systems during 2017, including 7628 (3.5%) seen in both. We found no measurable difference in prevalence after deduplication. The number of cases we identified differed slightly by data reconciliation strategy. Concordance of linked patients' demographic attributes varied by attribute. CONCLUSIONS: We implemented an HIE‐dependent, extensible process that deduplicates individuals for less biased prevalence estimates in a DDN. Our null pilot findings have limited generalizability. Overlap was small and likely insufficient to influence prevalence estimates. Other factors, including the number and size of partners, the matching algorithm, and the electronic phenotype may influence the degree of deduplication bias. Additional use cases may help improve understanding of duplication bias and reveal other principles and insights. This study informed how DDNs could support learning health systems' response to public health challenges and improve regional health. John Wiley and Sons Inc. 2021-11-28 /pmc/articles/PMC9284932/ /pubmed/35860322 http://dx.doi.org/10.1002/lrh2.10297 Text en © 2021 The Authors. Learning Health Systems published by Wiley Periodicals LLC on behalf of University of Michigan. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Reports
Scott, Kenneth A.
Davies, Sara Deakyne
Zucker, Rachel
Ong, Toan
Kraus, Emily McCormick
Kahn, Michael G
Bondy, Jessica
Daley, Matt F.
Horle, Kate
Bacon, Emily
Schilling, Lisa
Crume, Tessa
Hasnain‐Wynia, Romana
Foldy, Seth
Budney, Gregory
Davidson, Arthur J.
A process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records
title A process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records
title_full A process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records
title_fullStr A process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records
title_full_unstemmed A process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records
title_short A process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records
title_sort process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records
topic Research Reports
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9284932/
https://www.ncbi.nlm.nih.gov/pubmed/35860322
http://dx.doi.org/10.1002/lrh2.10297
work_keys_str_mv AT scottkennetha aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT daviessaradeakyne aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT zuckerrachel aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT ongtoan aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT krausemilymccormick aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT kahnmichaelg aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT bondyjessica aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT daleymattf aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT horlekate aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT baconemily aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT schillinglisa aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT crumetessa aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT hasnainwyniaromana aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT foldyseth aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT budneygregory aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT davidsonarthurj aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT scottkennetha processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT daviessaradeakyne processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT zuckerrachel processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT ongtoan processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT krausemilymccormick processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT kahnmichaelg processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT bondyjessica processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT daleymattf processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT horlekate processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT baconemily processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT schillinglisa processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT crumetessa processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT hasnainwyniaromana processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT foldyseth processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT budneygregory processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords
AT davidsonarthurj processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords