Cargando…

Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city

BACKGROUND: Electronic Health Records (EHR) has been increasingly used as a tool to monitor population health. However, subject-level errors in the records can yield biased estimates of health indicators. There is an urgent need for methods to estimate the prevalence of health indicators using large...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Ryung S., Shankar, Viswanathan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7137316/
https://www.ncbi.nlm.nih.gov/pubmed/32252642
http://dx.doi.org/10.1186/s12874-020-00956-6
_version_ 1783518402598404096
author Kim, Ryung S.
Shankar, Viswanathan
author_facet Kim, Ryung S.
Shankar, Viswanathan
author_sort Kim, Ryung S.
collection PubMed
description BACKGROUND: Electronic Health Records (EHR) has been increasingly used as a tool to monitor population health. However, subject-level errors in the records can yield biased estimates of health indicators. There is an urgent need for methods to estimate the prevalence of health indicators using large and real-time EHR while correcting the potential bias. METHODS: We demonstrate joint analyses of EHR and a smaller gold-standard health survey. We first adopted Mosteller’s method that pools two estimators, among which one is potentially biased. It only requires knowing the prevalence estimates from two data sources and their standard errors. Then, we adopted the method of Schenker et al., which uses multiple imputations of subject-level health outcomes that are missing for the subjects in EHR. This procedure requires information to link some subjects between two sources and modeling the mechanism of misclassification in EHR as well as modeling inclusion probabilities to both sources. RESULTS: In a simulation study, both estimators yielded negligible bias even when EHR was biased. They performed as well as health survey estimator when EHR bias was large and better than health survey estimator when EHR bias was moderate. It may be challenging to model the misclassification mechanism in real data for the subject-level imputation estimator. We illustrated the methods analyzing six health indicators from 2013 to 14 NYC HANES and the 2013 NYC Macroscope, and a study that linked some subjects in both data sources. CONCLUSIONS: When a small gold-standard health survey exists, it can serve as a safeguard against potential bias in EHR through the joint analysis of the two sources.
format Online
Article
Text
id pubmed-7137316
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-71373162020-04-11 Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city Kim, Ryung S. Shankar, Viswanathan BMC Med Res Methodol Research Article BACKGROUND: Electronic Health Records (EHR) has been increasingly used as a tool to monitor population health. However, subject-level errors in the records can yield biased estimates of health indicators. There is an urgent need for methods to estimate the prevalence of health indicators using large and real-time EHR while correcting the potential bias. METHODS: We demonstrate joint analyses of EHR and a smaller gold-standard health survey. We first adopted Mosteller’s method that pools two estimators, among which one is potentially biased. It only requires knowing the prevalence estimates from two data sources and their standard errors. Then, we adopted the method of Schenker et al., which uses multiple imputations of subject-level health outcomes that are missing for the subjects in EHR. This procedure requires information to link some subjects between two sources and modeling the mechanism of misclassification in EHR as well as modeling inclusion probabilities to both sources. RESULTS: In a simulation study, both estimators yielded negligible bias even when EHR was biased. They performed as well as health survey estimator when EHR bias was large and better than health survey estimator when EHR bias was moderate. It may be challenging to model the misclassification mechanism in real data for the subject-level imputation estimator. We illustrated the methods analyzing six health indicators from 2013 to 14 NYC HANES and the 2013 NYC Macroscope, and a study that linked some subjects in both data sources. CONCLUSIONS: When a small gold-standard health survey exists, it can serve as a safeguard against potential bias in EHR through the joint analysis of the two sources. BioMed Central 2020-04-06 /pmc/articles/PMC7137316/ /pubmed/32252642 http://dx.doi.org/10.1186/s12874-020-00956-6 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Kim, Ryung S.
Shankar, Viswanathan
Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city
title Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city
title_full Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city
title_fullStr Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city
title_full_unstemmed Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city
title_short Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city
title_sort prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in new york city
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7137316/
https://www.ncbi.nlm.nih.gov/pubmed/32252642
http://dx.doi.org/10.1186/s12874-020-00956-6
work_keys_str_mv AT kimryungs prevalenceestimationbyjointuseofbigdataandhealthsurveyademonstrationstudyusingelectronichealthrecordsinnewyorkcity
AT shankarviswanathan prevalenceestimationbyjointuseofbigdataandhealthsurveyademonstrationstudyusingelectronichealthrecordsinnewyorkcity