Cargando…

Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data

Background: Patient health information is collected routinely in electronic health records (EHRs) and used for research purposes, however, many health conditions are known to be under-diagnosed or under-recorded in EHRs. In research, missing diagnoses result in under-ascertainment of true cases, whi...

Descripción completa

Detalles Bibliográficos
Autores principales: Ford, Elizabeth, Rooney, Philip, Hurley, Peter, Oliver, Seb, Bremner, Stephen, Cassell, Jackie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7066995/
https://www.ncbi.nlm.nih.gov/pubmed/32211363
http://dx.doi.org/10.3389/fpubh.2020.00054
_version_ 1783505345170112512
author Ford, Elizabeth
Rooney, Philip
Hurley, Peter
Oliver, Seb
Bremner, Stephen
Cassell, Jackie
author_facet Ford, Elizabeth
Rooney, Philip
Hurley, Peter
Oliver, Seb
Bremner, Stephen
Cassell, Jackie
author_sort Ford, Elizabeth
collection PubMed
description Background: Patient health information is collected routinely in electronic health records (EHRs) and used for research purposes, however, many health conditions are known to be under-diagnosed or under-recorded in EHRs. In research, missing diagnoses result in under-ascertainment of true cases, which attenuates estimated associations between variables and results in a bias toward the null. Bayesian approaches allow the specification of prior information to the model, such as the likely rates of missingness in the data. This paper describes a Bayesian analysis approach which aimed to reduce attenuation of associations in EHR studies focussed on conditions characterized by under-diagnosis. Methods: Study 1: We created synthetic data, produced to mimic structured EHR data where diagnoses were under-recorded. We fitted logistic regression (LR) models with and without Bayesian priors representing rates of misclassification in the data. We examined the LR parameters estimated by models with and without priors. Study 2: We used EHR data from UK primary care in a case-control design with dementia as the outcome. We fitted LR models examining risk factors for dementia, with and without generic prior information on misclassification rates. We examined LR parameters estimated by models with and without the priors, and estimated classification accuracy using Area Under the Receiver Operating Characteristic. Results: Study 1: In synthetic data, estimates of LR parameters were much closer to the true parameter values when Bayesian priors were added to the model; with no priors, parameters were substantially attenuated by under-diagnosis. Study 2: The Bayesian approach ran well on real life clinic data from UK primary care, with the addition of prior information increasing LR parameter values in all cases. In multivariate regression models, Bayesian methods showed no improvement in classification accuracy over traditional LR. Conclusions: The Bayesian approach showed promise but had implementation challenges in real clinical data: prior information on rates of misclassification was difficult to find. Our simple model made a number of assumptions, such as diagnoses being missing at random. Further development is needed to integrate the method into studies using real-life EHR data. Our findings nevertheless highlight the importance of developing methods to address missing diagnoses in EHR data.
format Online
Article
Text
id pubmed-7066995
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-70669952020-03-24 Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data Ford, Elizabeth Rooney, Philip Hurley, Peter Oliver, Seb Bremner, Stephen Cassell, Jackie Front Public Health Public Health Background: Patient health information is collected routinely in electronic health records (EHRs) and used for research purposes, however, many health conditions are known to be under-diagnosed or under-recorded in EHRs. In research, missing diagnoses result in under-ascertainment of true cases, which attenuates estimated associations between variables and results in a bias toward the null. Bayesian approaches allow the specification of prior information to the model, such as the likely rates of missingness in the data. This paper describes a Bayesian analysis approach which aimed to reduce attenuation of associations in EHR studies focussed on conditions characterized by under-diagnosis. Methods: Study 1: We created synthetic data, produced to mimic structured EHR data where diagnoses were under-recorded. We fitted logistic regression (LR) models with and without Bayesian priors representing rates of misclassification in the data. We examined the LR parameters estimated by models with and without priors. Study 2: We used EHR data from UK primary care in a case-control design with dementia as the outcome. We fitted LR models examining risk factors for dementia, with and without generic prior information on misclassification rates. We examined LR parameters estimated by models with and without the priors, and estimated classification accuracy using Area Under the Receiver Operating Characteristic. Results: Study 1: In synthetic data, estimates of LR parameters were much closer to the true parameter values when Bayesian priors were added to the model; with no priors, parameters were substantially attenuated by under-diagnosis. Study 2: The Bayesian approach ran well on real life clinic data from UK primary care, with the addition of prior information increasing LR parameter values in all cases. In multivariate regression models, Bayesian methods showed no improvement in classification accuracy over traditional LR. Conclusions: The Bayesian approach showed promise but had implementation challenges in real clinical data: prior information on rates of misclassification was difficult to find. Our simple model made a number of assumptions, such as diagnoses being missing at random. Further development is needed to integrate the method into studies using real-life EHR data. Our findings nevertheless highlight the importance of developing methods to address missing diagnoses in EHR data. Frontiers Media S.A. 2020-03-05 /pmc/articles/PMC7066995/ /pubmed/32211363 http://dx.doi.org/10.3389/fpubh.2020.00054 Text en Copyright © 2020 Ford, Rooney, Hurley, Oliver, Bremner and Cassell. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Public Health
Ford, Elizabeth
Rooney, Philip
Hurley, Peter
Oliver, Seb
Bremner, Stephen
Cassell, Jackie
Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data
title Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data
title_full Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data
title_fullStr Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data
title_full_unstemmed Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data
title_short Can the Use of Bayesian Analysis Methods Correct for Incompleteness in Electronic Health Records Diagnosis Data? Development of a Novel Method Using Simulated and Real-Life Clinical Data
title_sort can the use of bayesian analysis methods correct for incompleteness in electronic health records diagnosis data? development of a novel method using simulated and real-life clinical data
topic Public Health
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7066995/
https://www.ncbi.nlm.nih.gov/pubmed/32211363
http://dx.doi.org/10.3389/fpubh.2020.00054
work_keys_str_mv AT fordelizabeth cantheuseofbayesiananalysismethodscorrectforincompletenessinelectronichealthrecordsdiagnosisdatadevelopmentofanovelmethodusingsimulatedandreallifeclinicaldata
AT rooneyphilip cantheuseofbayesiananalysismethodscorrectforincompletenessinelectronichealthrecordsdiagnosisdatadevelopmentofanovelmethodusingsimulatedandreallifeclinicaldata
AT hurleypeter cantheuseofbayesiananalysismethodscorrectforincompletenessinelectronichealthrecordsdiagnosisdatadevelopmentofanovelmethodusingsimulatedandreallifeclinicaldata
AT oliverseb cantheuseofbayesiananalysismethodscorrectforincompletenessinelectronichealthrecordsdiagnosisdatadevelopmentofanovelmethodusingsimulatedandreallifeclinicaldata
AT bremnerstephen cantheuseofbayesiananalysismethodscorrectforincompletenessinelectronichealthrecordsdiagnosisdatadevelopmentofanovelmethodusingsimulatedandreallifeclinicaldata
AT casselljackie cantheuseofbayesiananalysismethodscorrectforincompletenessinelectronichealthrecordsdiagnosisdatadevelopmentofanovelmethodusingsimulatedandreallifeclinicaldata