Cargando…

Combining population-based administrative health records and electronic medical records for disease surveillance

BACKGROUND: Administrative health records (AHRs) and electronic medical records (EMRs) are two key sources of population-based data for disease surveillance, but misclassification errors in the data can bias disease estimates. Methods that combine information from error-prone data sources can build...

Descripción completa

Detalles Bibliográficos
Autores principales: Al-Azazi, Saeed, Singer, Alexander, Rabbani, Rasheda, Lix, Lisa M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6604278/
https://www.ncbi.nlm.nih.gov/pubmed/31266516
http://dx.doi.org/10.1186/s12911-019-0845-5
_version_ 1783431677472669696
author Al-Azazi, Saeed
Singer, Alexander
Rabbani, Rasheda
Lix, Lisa M.
author_facet Al-Azazi, Saeed
Singer, Alexander
Rabbani, Rasheda
Lix, Lisa M.
author_sort Al-Azazi, Saeed
collection PubMed
description BACKGROUND: Administrative health records (AHRs) and electronic medical records (EMRs) are two key sources of population-based data for disease surveillance, but misclassification errors in the data can bias disease estimates. Methods that combine information from error-prone data sources can build on the strengths of AHRs and EMRs. We compared bias and error for four data-combining methods and applied them to estimate hypertension prevalence. METHODS: Our study included rule-based OR and AND methods that identify disease cases from either or both data sources, respectively, rule-based sensitivity-specificity adjusted (RSSA) method that corrects for inaccuracies using a deterministic rule, and probabilistic-based sensitivity-specificity adjusted (PSSA) method that corrects for error using a statistical model. Computer simulation was used to estimate relative bias (RB) and mean square error (MSE) under varying conditions of population disease prevalence, correlation amongst data sources, and amount of misclassification error. AHRs and EMRs for Manitoba, Canada were used to estimate hypertension prevalence using validated case definitions and multiple disease markers. RESULTS: The OR method had the lowest RB and MSE when population disease prevalence was 10%, and the RSSA method had the lowest RB and MSE when population prevalence increased to 20%. As the correlation between data sources increased, the OR method resulted in the lowest RB and MSE. Estimates of hypertension prevalence for AHRs and EMRs alone were 30.9% (95% CI: 30.6–31.2) and 24.9% (95% CI: 24.6–25.2), respectively. The estimates were 21.4% (95% CI: 21.1–21.7), for the AND method, 34.4% (95% CI: 34.1–34.8) for the OR method, 32.2% (95% CI: 31.8–32.6) for the RSSA method, and ranged from 34.3% (95% CI: 34.1–34.5) to 35.9% (95% CI, 35.7–36.1) for the PSSA method, depending on the statistical model. CONCLUSIONS: The OR and AND methods are influenced by correlation amongst the data sources, while the RSSA method is dependent on the accuracy of prior sensitivity and specificity estimates. The PSSA method performed well when population prevalence was high and average correlations amongst disease markers was low. This study will guide researchers to select a data-combining method that best suits their data characteristics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-019-0845-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6604278
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66042782019-07-12 Combining population-based administrative health records and electronic medical records for disease surveillance Al-Azazi, Saeed Singer, Alexander Rabbani, Rasheda Lix, Lisa M. BMC Med Inform Decis Mak Research Article BACKGROUND: Administrative health records (AHRs) and electronic medical records (EMRs) are two key sources of population-based data for disease surveillance, but misclassification errors in the data can bias disease estimates. Methods that combine information from error-prone data sources can build on the strengths of AHRs and EMRs. We compared bias and error for four data-combining methods and applied them to estimate hypertension prevalence. METHODS: Our study included rule-based OR and AND methods that identify disease cases from either or both data sources, respectively, rule-based sensitivity-specificity adjusted (RSSA) method that corrects for inaccuracies using a deterministic rule, and probabilistic-based sensitivity-specificity adjusted (PSSA) method that corrects for error using a statistical model. Computer simulation was used to estimate relative bias (RB) and mean square error (MSE) under varying conditions of population disease prevalence, correlation amongst data sources, and amount of misclassification error. AHRs and EMRs for Manitoba, Canada were used to estimate hypertension prevalence using validated case definitions and multiple disease markers. RESULTS: The OR method had the lowest RB and MSE when population disease prevalence was 10%, and the RSSA method had the lowest RB and MSE when population prevalence increased to 20%. As the correlation between data sources increased, the OR method resulted in the lowest RB and MSE. Estimates of hypertension prevalence for AHRs and EMRs alone were 30.9% (95% CI: 30.6–31.2) and 24.9% (95% CI: 24.6–25.2), respectively. The estimates were 21.4% (95% CI: 21.1–21.7), for the AND method, 34.4% (95% CI: 34.1–34.8) for the OR method, 32.2% (95% CI: 31.8–32.6) for the RSSA method, and ranged from 34.3% (95% CI: 34.1–34.5) to 35.9% (95% CI, 35.7–36.1) for the PSSA method, depending on the statistical model. CONCLUSIONS: The OR and AND methods are influenced by correlation amongst the data sources, while the RSSA method is dependent on the accuracy of prior sensitivity and specificity estimates. The PSSA method performed well when population prevalence was high and average correlations amongst disease markers was low. This study will guide researchers to select a data-combining method that best suits their data characteristics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-019-0845-5) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-02 /pmc/articles/PMC6604278/ /pubmed/31266516 http://dx.doi.org/10.1186/s12911-019-0845-5 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Al-Azazi, Saeed
Singer, Alexander
Rabbani, Rasheda
Lix, Lisa M.
Combining population-based administrative health records and electronic medical records for disease surveillance
title Combining population-based administrative health records and electronic medical records for disease surveillance
title_full Combining population-based administrative health records and electronic medical records for disease surveillance
title_fullStr Combining population-based administrative health records and electronic medical records for disease surveillance
title_full_unstemmed Combining population-based administrative health records and electronic medical records for disease surveillance
title_short Combining population-based administrative health records and electronic medical records for disease surveillance
title_sort combining population-based administrative health records and electronic medical records for disease surveillance
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6604278/
https://www.ncbi.nlm.nih.gov/pubmed/31266516
http://dx.doi.org/10.1186/s12911-019-0845-5
work_keys_str_mv AT alazazisaeed combiningpopulationbasedadministrativehealthrecordsandelectronicmedicalrecordsfordiseasesurveillance
AT singeralexander combiningpopulationbasedadministrativehealthrecordsandelectronicmedicalrecordsfordiseasesurveillance
AT rabbanirasheda combiningpopulationbasedadministrativehealthrecordsandelectronicmedicalrecordsfordiseasesurveillance
AT lixlisam combiningpopulationbasedadministrativehealthrecordsandelectronicmedicalrecordsfordiseasesurveillance