Cargando…

Detection of probable dementia cases in undiagnosed patients using structured and unstructured electronic health records

BACKGROUND: Dementia is underdiagnosed in both the general population and among Veterans. This underdiagnosis decreases quality of life, reduces opportunities for interventions, and increases health-care costs. New approaches are therefore necessary to facilitate the timely detection of dementia. Th...

Descripción completa

Detalles Bibliográficos
Autores principales: Shao, Yijun, Zeng, Qing T., Chen, Kathryn K., Shutes-David, Andrew, Thielke, Stephen M., Tsuang, Debby W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6617952/
https://www.ncbi.nlm.nih.gov/pubmed/31288818
http://dx.doi.org/10.1186/s12911-019-0846-4
_version_ 1783433810080169984
author Shao, Yijun
Zeng, Qing T.
Chen, Kathryn K.
Shutes-David, Andrew
Thielke, Stephen M.
Tsuang, Debby W.
author_facet Shao, Yijun
Zeng, Qing T.
Chen, Kathryn K.
Shutes-David, Andrew
Thielke, Stephen M.
Tsuang, Debby W.
author_sort Shao, Yijun
collection PubMed
description BACKGROUND: Dementia is underdiagnosed in both the general population and among Veterans. This underdiagnosis decreases quality of life, reduces opportunities for interventions, and increases health-care costs. New approaches are therefore necessary to facilitate the timely detection of dementia. This study seeks to identify cases of undiagnosed dementia by developing and validating a weakly supervised machine-learning approach that incorporates the analysis of both structured and unstructured electronic health record (EHR) data. METHODS: A topic modeling approach that included latent Dirichlet allocation, stable topic extraction, and random sampling was applied to VHA EHRs. Topic features from unstructured data and features from structured data were compared between Veterans with (n = 1861) and without (n = 9305) ICD-9 dementia codes. A logistic regression model was used to develop dementia prediction scores, and manual reviews were conducted to validate the machine-learning results. RESULTS: A total of 853 features were identified (290 topics, 174 non-dementia ICD codes, 159 CPT codes, 59 medications, and 171 note types) for the development of logistic regression prediction scores. These scores were validated in a subset of Veterans without ICD-9 dementia codes (n = 120) by experts in dementia who performed manual record reviews and achieved a high level of inter-rater agreement. The manual reviews were used to develop a receiver of characteristic (ROC) curve with different thresholds for case detection, including a threshold of 0.061, which produced an optimal sensitivity (0.825) and specificity (0.832). CONCLUSIONS: Dementia is underdiagnosed, and thus, ICD codes alone cannot serve as a gold standard for diagnosis. However, this study suggests that imperfect data (e.g., ICD codes in combination with other EHR features) can serve as a silver standard to develop a risk model, apply that model to patients without dementia codes, and then select a case-detection threshold. The study is one of the first to utilize both structured and unstructured EHRs to develop risk scores for the diagnosis of dementia.
format Online
Article
Text
id pubmed-6617952
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66179522019-07-22 Detection of probable dementia cases in undiagnosed patients using structured and unstructured electronic health records Shao, Yijun Zeng, Qing T. Chen, Kathryn K. Shutes-David, Andrew Thielke, Stephen M. Tsuang, Debby W. BMC Med Inform Decis Mak Research Article BACKGROUND: Dementia is underdiagnosed in both the general population and among Veterans. This underdiagnosis decreases quality of life, reduces opportunities for interventions, and increases health-care costs. New approaches are therefore necessary to facilitate the timely detection of dementia. This study seeks to identify cases of undiagnosed dementia by developing and validating a weakly supervised machine-learning approach that incorporates the analysis of both structured and unstructured electronic health record (EHR) data. METHODS: A topic modeling approach that included latent Dirichlet allocation, stable topic extraction, and random sampling was applied to VHA EHRs. Topic features from unstructured data and features from structured data were compared between Veterans with (n = 1861) and without (n = 9305) ICD-9 dementia codes. A logistic regression model was used to develop dementia prediction scores, and manual reviews were conducted to validate the machine-learning results. RESULTS: A total of 853 features were identified (290 topics, 174 non-dementia ICD codes, 159 CPT codes, 59 medications, and 171 note types) for the development of logistic regression prediction scores. These scores were validated in a subset of Veterans without ICD-9 dementia codes (n = 120) by experts in dementia who performed manual record reviews and achieved a high level of inter-rater agreement. The manual reviews were used to develop a receiver of characteristic (ROC) curve with different thresholds for case detection, including a threshold of 0.061, which produced an optimal sensitivity (0.825) and specificity (0.832). CONCLUSIONS: Dementia is underdiagnosed, and thus, ICD codes alone cannot serve as a gold standard for diagnosis. However, this study suggests that imperfect data (e.g., ICD codes in combination with other EHR features) can serve as a silver standard to develop a risk model, apply that model to patients without dementia codes, and then select a case-detection threshold. The study is one of the first to utilize both structured and unstructured EHRs to develop risk scores for the diagnosis of dementia. BioMed Central 2019-07-09 /pmc/articles/PMC6617952/ /pubmed/31288818 http://dx.doi.org/10.1186/s12911-019-0846-4 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Shao, Yijun
Zeng, Qing T.
Chen, Kathryn K.
Shutes-David, Andrew
Thielke, Stephen M.
Tsuang, Debby W.
Detection of probable dementia cases in undiagnosed patients using structured and unstructured electronic health records
title Detection of probable dementia cases in undiagnosed patients using structured and unstructured electronic health records
title_full Detection of probable dementia cases in undiagnosed patients using structured and unstructured electronic health records
title_fullStr Detection of probable dementia cases in undiagnosed patients using structured and unstructured electronic health records
title_full_unstemmed Detection of probable dementia cases in undiagnosed patients using structured and unstructured electronic health records
title_short Detection of probable dementia cases in undiagnosed patients using structured and unstructured electronic health records
title_sort detection of probable dementia cases in undiagnosed patients using structured and unstructured electronic health records
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6617952/
https://www.ncbi.nlm.nih.gov/pubmed/31288818
http://dx.doi.org/10.1186/s12911-019-0846-4
work_keys_str_mv AT shaoyijun detectionofprobabledementiacasesinundiagnosedpatientsusingstructuredandunstructuredelectronichealthrecords
AT zengqingt detectionofprobabledementiacasesinundiagnosedpatientsusingstructuredandunstructuredelectronichealthrecords
AT chenkathrynk detectionofprobabledementiacasesinundiagnosedpatientsusingstructuredandunstructuredelectronichealthrecords
AT shutesdavidandrew detectionofprobabledementiacasesinundiagnosedpatientsusingstructuredandunstructuredelectronichealthrecords
AT thielkestephenm detectionofprobabledementiacasesinundiagnosedpatientsusingstructuredandunstructuredelectronichealthrecords
AT tsuangdebbyw detectionofprobabledementiacasesinundiagnosedpatientsusingstructuredandunstructuredelectronichealthrecords