Cargando…
A process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records
INTRODUCTION: Learning health systems can help estimate chronic disease prevalence through distributed data networks (DDNs). Concerns remain about bias introduced to DDN prevalence estimates when individuals seeking care across systems are counted multiple times. This paper describes a process to de...
Autores principales: | , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9284932/ https://www.ncbi.nlm.nih.gov/pubmed/35860322 http://dx.doi.org/10.1002/lrh2.10297 |
_version_ | 1784747671639031808 |
---|---|
author | Scott, Kenneth A. Davies, Sara Deakyne Zucker, Rachel Ong, Toan Kraus, Emily McCormick Kahn, Michael G Bondy, Jessica Daley, Matt F. Horle, Kate Bacon, Emily Schilling, Lisa Crume, Tessa Hasnain‐Wynia, Romana Foldy, Seth Budney, Gregory Davidson, Arthur J. |
author_facet | Scott, Kenneth A. Davies, Sara Deakyne Zucker, Rachel Ong, Toan Kraus, Emily McCormick Kahn, Michael G Bondy, Jessica Daley, Matt F. Horle, Kate Bacon, Emily Schilling, Lisa Crume, Tessa Hasnain‐Wynia, Romana Foldy, Seth Budney, Gregory Davidson, Arthur J. |
author_sort | Scott, Kenneth A. |
collection | PubMed |
description | INTRODUCTION: Learning health systems can help estimate chronic disease prevalence through distributed data networks (DDNs). Concerns remain about bias introduced to DDN prevalence estimates when individuals seeking care across systems are counted multiple times. This paper describes a process to deduplicate individuals for DDN prevalence estimates. METHODS: We operationalized a two‐step deduplication process, leveraging health information exchange (HIE)‐assigned network identifiers, within the Colorado Health Observation Regional Data Service (CHORDS) DDN. We generated prevalence estimates for type 1 and type 2 diabetes among pediatric patients (0‐17 years) with at least one 2017 encounter in one of two geographically‐proximate DDN partners. We assessed the extent of cross‐system duplication and its effect on prevalence estimates. RESULTS: We identified 218 437 unique pediatric patients seen across systems during 2017, including 7628 (3.5%) seen in both. We found no measurable difference in prevalence after deduplication. The number of cases we identified differed slightly by data reconciliation strategy. Concordance of linked patients' demographic attributes varied by attribute. CONCLUSIONS: We implemented an HIE‐dependent, extensible process that deduplicates individuals for less biased prevalence estimates in a DDN. Our null pilot findings have limited generalizability. Overlap was small and likely insufficient to influence prevalence estimates. Other factors, including the number and size of partners, the matching algorithm, and the electronic phenotype may influence the degree of deduplication bias. Additional use cases may help improve understanding of duplication bias and reveal other principles and insights. This study informed how DDNs could support learning health systems' response to public health challenges and improve regional health. |
format | Online Article Text |
id | pubmed-9284932 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-92849322022-07-19 A process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records Scott, Kenneth A. Davies, Sara Deakyne Zucker, Rachel Ong, Toan Kraus, Emily McCormick Kahn, Michael G Bondy, Jessica Daley, Matt F. Horle, Kate Bacon, Emily Schilling, Lisa Crume, Tessa Hasnain‐Wynia, Romana Foldy, Seth Budney, Gregory Davidson, Arthur J. Learn Health Syst Research Reports INTRODUCTION: Learning health systems can help estimate chronic disease prevalence through distributed data networks (DDNs). Concerns remain about bias introduced to DDN prevalence estimates when individuals seeking care across systems are counted multiple times. This paper describes a process to deduplicate individuals for DDN prevalence estimates. METHODS: We operationalized a two‐step deduplication process, leveraging health information exchange (HIE)‐assigned network identifiers, within the Colorado Health Observation Regional Data Service (CHORDS) DDN. We generated prevalence estimates for type 1 and type 2 diabetes among pediatric patients (0‐17 years) with at least one 2017 encounter in one of two geographically‐proximate DDN partners. We assessed the extent of cross‐system duplication and its effect on prevalence estimates. RESULTS: We identified 218 437 unique pediatric patients seen across systems during 2017, including 7628 (3.5%) seen in both. We found no measurable difference in prevalence after deduplication. The number of cases we identified differed slightly by data reconciliation strategy. Concordance of linked patients' demographic attributes varied by attribute. CONCLUSIONS: We implemented an HIE‐dependent, extensible process that deduplicates individuals for less biased prevalence estimates in a DDN. Our null pilot findings have limited generalizability. Overlap was small and likely insufficient to influence prevalence estimates. Other factors, including the number and size of partners, the matching algorithm, and the electronic phenotype may influence the degree of deduplication bias. Additional use cases may help improve understanding of duplication bias and reveal other principles and insights. This study informed how DDNs could support learning health systems' response to public health challenges and improve regional health. John Wiley and Sons Inc. 2021-11-28 /pmc/articles/PMC9284932/ /pubmed/35860322 http://dx.doi.org/10.1002/lrh2.10297 Text en © 2021 The Authors. Learning Health Systems published by Wiley Periodicals LLC on behalf of University of Michigan. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Reports Scott, Kenneth A. Davies, Sara Deakyne Zucker, Rachel Ong, Toan Kraus, Emily McCormick Kahn, Michael G Bondy, Jessica Daley, Matt F. Horle, Kate Bacon, Emily Schilling, Lisa Crume, Tessa Hasnain‐Wynia, Romana Foldy, Seth Budney, Gregory Davidson, Arthur J. A process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records |
title | A process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records |
title_full | A process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records |
title_fullStr | A process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records |
title_full_unstemmed | A process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records |
title_short | A process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records |
title_sort | process to deduplicate individuals for regional chronic disease prevalence estimates using a distributed data network of electronic health records |
topic | Research Reports |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9284932/ https://www.ncbi.nlm.nih.gov/pubmed/35860322 http://dx.doi.org/10.1002/lrh2.10297 |
work_keys_str_mv | AT scottkennetha aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT daviessaradeakyne aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT zuckerrachel aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT ongtoan aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT krausemilymccormick aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT kahnmichaelg aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT bondyjessica aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT daleymattf aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT horlekate aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT baconemily aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT schillinglisa aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT crumetessa aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT hasnainwyniaromana aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT foldyseth aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT budneygregory aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT davidsonarthurj aprocesstodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT scottkennetha processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT daviessaradeakyne processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT zuckerrachel processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT ongtoan processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT krausemilymccormick processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT kahnmichaelg processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT bondyjessica processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT daleymattf processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT horlekate processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT baconemily processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT schillinglisa processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT crumetessa processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT hasnainwyniaromana processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT foldyseth processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT budneygregory processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords AT davidsonarthurj processtodeduplicateindividualsforregionalchronicdiseaseprevalenceestimatesusingadistributeddatanetworkofelectronichealthrecords |