Cargando…

Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening

BACKGROUND: Patients and their loved ones often report symptoms or complaints of cognitive decline that clinicians note in free clinical text, but no structured screening or diagnostic data are recorded. These symptoms/complaints may be signals that predict who will go on to be diagnosed with mild c...

Descripción completa

Detalles Bibliográficos
Autores principales: Penfold, Robert B., Carrell, David S., Cronkite, David J., Pabiniak, Chester, Dodd, Tammy, Glass, Ashley MH, Johnson, Eric, Thompson, Ella, Arrighi, H. Michael, Stang, Paul E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9097352/
https://www.ncbi.nlm.nih.gov/pubmed/35549702
http://dx.doi.org/10.1186/s12911-022-01864-z
_version_ 1784706158943010816
author Penfold, Robert B.
Carrell, David S.
Cronkite, David J.
Pabiniak, Chester
Dodd, Tammy
Glass, Ashley MH
Johnson, Eric
Thompson, Ella
Arrighi, H. Michael
Stang, Paul E.
author_facet Penfold, Robert B.
Carrell, David S.
Cronkite, David J.
Pabiniak, Chester
Dodd, Tammy
Glass, Ashley MH
Johnson, Eric
Thompson, Ella
Arrighi, H. Michael
Stang, Paul E.
author_sort Penfold, Robert B.
collection PubMed
description BACKGROUND: Patients and their loved ones often report symptoms or complaints of cognitive decline that clinicians note in free clinical text, but no structured screening or diagnostic data are recorded. These symptoms/complaints may be signals that predict who will go on to be diagnosed with mild cognitive impairment (MCI) and ultimately develop Alzheimer’s Disease or related dementias. Our objective was to develop a natural language processing system and prediction model for identification of MCI from clinical text in the absence of screening or other structured diagnostic information. METHODS: There were two populations of patients: 1794 participants in the Adult Changes in Thought (ACT) study and 2391 patients in the general population of Kaiser Permanente Washington. All individuals had standardized cognitive assessment scores. We excluded patients with a diagnosis of Alzheimer’s Disease, Dementia or use of donepezil. We manually annotated 10,391 clinic notes to train the NLP model. Standard Python code was used to extract phrases from notes and map each phrase to a cognitive functioning concept. Concepts derived from the NLP system were used to predict future MCI. The prediction model was trained on the ACT cohort and 60% of the general population cohort with 40% withheld for validation. We used a least absolute shrinkage and selection operator logistic regression approach (LASSO) to fit a prediction model with MCI as the prediction target. Using the predicted case status from the LASSO model and known MCI from standardized scores, we constructed receiver operating curves to measure model performance. RESULTS: Chart abstraction identified 42 MCI concepts. Prediction model performance in the validation data set was modest with an area under the curve of 0.67. Setting the cutoff for correct classification at 0.60, the classifier yielded sensitivity of 1.7%, specificity of 99.7%, PPV of 70% and NPV of 70.5% in the validation cohort. DISCUSSION AND CONCLUSION: Although the sensitivity of the machine learning model was poor, negative predictive value was high, an important characteristic of models used for population-based screening. While an AUC of 0.67 is generally considered moderate performance, it is also comparable to several tests that are widely used in clinical practice. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-022-01864-z.
format Online
Article
Text
id pubmed-9097352
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-90973522022-05-13 Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening Penfold, Robert B. Carrell, David S. Cronkite, David J. Pabiniak, Chester Dodd, Tammy Glass, Ashley MH Johnson, Eric Thompson, Ella Arrighi, H. Michael Stang, Paul E. BMC Med Inform Decis Mak Research Article BACKGROUND: Patients and their loved ones often report symptoms or complaints of cognitive decline that clinicians note in free clinical text, but no structured screening or diagnostic data are recorded. These symptoms/complaints may be signals that predict who will go on to be diagnosed with mild cognitive impairment (MCI) and ultimately develop Alzheimer’s Disease or related dementias. Our objective was to develop a natural language processing system and prediction model for identification of MCI from clinical text in the absence of screening or other structured diagnostic information. METHODS: There were two populations of patients: 1794 participants in the Adult Changes in Thought (ACT) study and 2391 patients in the general population of Kaiser Permanente Washington. All individuals had standardized cognitive assessment scores. We excluded patients with a diagnosis of Alzheimer’s Disease, Dementia or use of donepezil. We manually annotated 10,391 clinic notes to train the NLP model. Standard Python code was used to extract phrases from notes and map each phrase to a cognitive functioning concept. Concepts derived from the NLP system were used to predict future MCI. The prediction model was trained on the ACT cohort and 60% of the general population cohort with 40% withheld for validation. We used a least absolute shrinkage and selection operator logistic regression approach (LASSO) to fit a prediction model with MCI as the prediction target. Using the predicted case status from the LASSO model and known MCI from standardized scores, we constructed receiver operating curves to measure model performance. RESULTS: Chart abstraction identified 42 MCI concepts. Prediction model performance in the validation data set was modest with an area under the curve of 0.67. Setting the cutoff for correct classification at 0.60, the classifier yielded sensitivity of 1.7%, specificity of 99.7%, PPV of 70% and NPV of 70.5% in the validation cohort. DISCUSSION AND CONCLUSION: Although the sensitivity of the machine learning model was poor, negative predictive value was high, an important characteristic of models used for population-based screening. While an AUC of 0.67 is generally considered moderate performance, it is also comparable to several tests that are widely used in clinical practice. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-022-01864-z. BioMed Central 2022-05-12 /pmc/articles/PMC9097352/ /pubmed/35549702 http://dx.doi.org/10.1186/s12911-022-01864-z Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Penfold, Robert B.
Carrell, David S.
Cronkite, David J.
Pabiniak, Chester
Dodd, Tammy
Glass, Ashley MH
Johnson, Eric
Thompson, Ella
Arrighi, H. Michael
Stang, Paul E.
Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening
title Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening
title_full Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening
title_fullStr Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening
title_full_unstemmed Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening
title_short Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening
title_sort development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9097352/
https://www.ncbi.nlm.nih.gov/pubmed/35549702
http://dx.doi.org/10.1186/s12911-022-01864-z
work_keys_str_mv AT penfoldrobertb developmentofamachinelearningmodeltopredictmildcognitiveimpairmentusingnaturallanguageprocessingintheabsenceofscreening
AT carrelldavids developmentofamachinelearningmodeltopredictmildcognitiveimpairmentusingnaturallanguageprocessingintheabsenceofscreening
AT cronkitedavidj developmentofamachinelearningmodeltopredictmildcognitiveimpairmentusingnaturallanguageprocessingintheabsenceofscreening
AT pabiniakchester developmentofamachinelearningmodeltopredictmildcognitiveimpairmentusingnaturallanguageprocessingintheabsenceofscreening
AT doddtammy developmentofamachinelearningmodeltopredictmildcognitiveimpairmentusingnaturallanguageprocessingintheabsenceofscreening
AT glassashleymh developmentofamachinelearningmodeltopredictmildcognitiveimpairmentusingnaturallanguageprocessingintheabsenceofscreening
AT johnsoneric developmentofamachinelearningmodeltopredictmildcognitiveimpairmentusingnaturallanguageprocessingintheabsenceofscreening
AT thompsonella developmentofamachinelearningmodeltopredictmildcognitiveimpairmentusingnaturallanguageprocessingintheabsenceofscreening
AT arrighihmichael developmentofamachinelearningmodeltopredictmildcognitiveimpairmentusingnaturallanguageprocessingintheabsenceofscreening
AT stangpaule developmentofamachinelearningmodeltopredictmildcognitiveimpairmentusingnaturallanguageprocessingintheabsenceofscreening