Cargando…

Data Quality in Electronic Health Record Research: An Approach for Validation and Quantitative Bias Analysis for Imperfectly Ascertained Health Outcomes Via Diagnostic Codes

It is incumbent upon all researchers who use the electronic health record (EHR), including data scientists, to understand the quality of such data. EHR data may be subject to measurement error or misclassification that have the potential to bias results, unless one applies the available computationa...

Descripción completa

Detalles Bibliográficos
Autores principales: Goldstein, Neal D., Kahal, Deborah, Testa, Karla, Gracely, Ed J., Burstyn, Igor
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9624477/
https://www.ncbi.nlm.nih.gov/pubmed/36324333
http://dx.doi.org/10.1162/99608f92.cbe67e91
_version_ 1784822242039824384
author Goldstein, Neal D.
Kahal, Deborah
Testa, Karla
Gracely, Ed J.
Burstyn, Igor
author_facet Goldstein, Neal D.
Kahal, Deborah
Testa, Karla
Gracely, Ed J.
Burstyn, Igor
author_sort Goldstein, Neal D.
collection PubMed
description It is incumbent upon all researchers who use the electronic health record (EHR), including data scientists, to understand the quality of such data. EHR data may be subject to measurement error or misclassification that have the potential to bias results, unless one applies the available computational techniques specifically created for this problem. In this article, we begin with a discussion of data-quality issues in the EHR focusing on health outcomes. We review the concepts of sensitivity, specificity, positive and negative predictive values, and demonstrate how the imperfect classification of a dichotomous outcome variable can bias an analysis, both in terms of prevalence of the outcome, and relative risk of the outcome under one treatment regime (aka exposure) compared to another. This is then followed by a description of a generalizable approach to probabilistic (quantitative) bias analysis using a combination of regression estimation of the parameters that relate the true and observed data and application of these estimates to adjust the prevalence and relative risk that may have existed if there was no misclassification. We describe bias analysis that accounts for both random and systematic errors and highlight its limitations. We then motivate a case study with the goal of validating the accuracy of a health outcome, chronic infection with hepatitis C virus, derived from a diagnostic code in the EHR. Finally, we demonstrate our approaches on the case study and conclude by summarizing the literature on outcome misclassification and quantitative bias analysis.
format Online
Article
Text
id pubmed-9624477
institution National Center for Biotechnology Information
language English
publishDate 2022
record_format MEDLINE/PubMed
spelling pubmed-96244772022-11-01 Data Quality in Electronic Health Record Research: An Approach for Validation and Quantitative Bias Analysis for Imperfectly Ascertained Health Outcomes Via Diagnostic Codes Goldstein, Neal D. Kahal, Deborah Testa, Karla Gracely, Ed J. Burstyn, Igor Harv Data Sci Rev Article It is incumbent upon all researchers who use the electronic health record (EHR), including data scientists, to understand the quality of such data. EHR data may be subject to measurement error or misclassification that have the potential to bias results, unless one applies the available computational techniques specifically created for this problem. In this article, we begin with a discussion of data-quality issues in the EHR focusing on health outcomes. We review the concepts of sensitivity, specificity, positive and negative predictive values, and demonstrate how the imperfect classification of a dichotomous outcome variable can bias an analysis, both in terms of prevalence of the outcome, and relative risk of the outcome under one treatment regime (aka exposure) compared to another. This is then followed by a description of a generalizable approach to probabilistic (quantitative) bias analysis using a combination of regression estimation of the parameters that relate the true and observed data and application of these estimates to adjust the prevalence and relative risk that may have existed if there was no misclassification. We describe bias analysis that accounts for both random and systematic errors and highlight its limitations. We then motivate a case study with the goal of validating the accuracy of a health outcome, chronic infection with hepatitis C virus, derived from a diagnostic code in the EHR. Finally, we demonstrate our approaches on the case study and conclude by summarizing the literature on outcome misclassification and quantitative bias analysis. 2022 2022-04-28 /pmc/articles/PMC9624477/ /pubmed/36324333 http://dx.doi.org/10.1162/99608f92.cbe67e91 Text en https://creativecommons.org/licenses/by/4.0/This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.
spellingShingle Article
Goldstein, Neal D.
Kahal, Deborah
Testa, Karla
Gracely, Ed J.
Burstyn, Igor
Data Quality in Electronic Health Record Research: An Approach for Validation and Quantitative Bias Analysis for Imperfectly Ascertained Health Outcomes Via Diagnostic Codes
title Data Quality in Electronic Health Record Research: An Approach for Validation and Quantitative Bias Analysis for Imperfectly Ascertained Health Outcomes Via Diagnostic Codes
title_full Data Quality in Electronic Health Record Research: An Approach for Validation and Quantitative Bias Analysis for Imperfectly Ascertained Health Outcomes Via Diagnostic Codes
title_fullStr Data Quality in Electronic Health Record Research: An Approach for Validation and Quantitative Bias Analysis for Imperfectly Ascertained Health Outcomes Via Diagnostic Codes
title_full_unstemmed Data Quality in Electronic Health Record Research: An Approach for Validation and Quantitative Bias Analysis for Imperfectly Ascertained Health Outcomes Via Diagnostic Codes
title_short Data Quality in Electronic Health Record Research: An Approach for Validation and Quantitative Bias Analysis for Imperfectly Ascertained Health Outcomes Via Diagnostic Codes
title_sort data quality in electronic health record research: an approach for validation and quantitative bias analysis for imperfectly ascertained health outcomes via diagnostic codes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9624477/
https://www.ncbi.nlm.nih.gov/pubmed/36324333
http://dx.doi.org/10.1162/99608f92.cbe67e91
work_keys_str_mv AT goldsteinneald dataqualityinelectronichealthrecordresearchanapproachforvalidationandquantitativebiasanalysisforimperfectlyascertainedhealthoutcomesviadiagnosticcodes
AT kahaldeborah dataqualityinelectronichealthrecordresearchanapproachforvalidationandquantitativebiasanalysisforimperfectlyascertainedhealthoutcomesviadiagnosticcodes
AT testakarla dataqualityinelectronichealthrecordresearchanapproachforvalidationandquantitativebiasanalysisforimperfectlyascertainedhealthoutcomesviadiagnosticcodes
AT gracelyedj dataqualityinelectronichealthrecordresearchanapproachforvalidationandquantitativebiasanalysisforimperfectlyascertainedhealthoutcomesviadiagnosticcodes
AT burstynigor dataqualityinelectronichealthrecordresearchanapproachforvalidationandquantitativebiasanalysisforimperfectlyascertainedhealthoutcomesviadiagnosticcodes