Cargando…

Extraction of clinical phenotypes for Alzheimer’s disease dementia from clinical notes using natural language processing

OBJECTIVES: There is much interest in utilizing clinical data for developing prediction models for Alzheimer’s disease (AD) risk, progression, and outcomes. Existing studies have mostly utilized curated research registries, image analysis, and structured electronic health record (EHR) data. However,...

Descripción completa

Detalles Bibliográficos
Autores principales: Oh, Inez Y, Schindler, Suzanne E, Ghoshal, Nupur, Lai, Albert M, Payne, Philip R O, Gupta, Aditi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9952043/
https://www.ncbi.nlm.nih.gov/pubmed/36844369
http://dx.doi.org/10.1093/jamiaopen/ooad014
_version_ 1784893532105867264
author Oh, Inez Y
Schindler, Suzanne E
Ghoshal, Nupur
Lai, Albert M
Payne, Philip R O
Gupta, Aditi
author_facet Oh, Inez Y
Schindler, Suzanne E
Ghoshal, Nupur
Lai, Albert M
Payne, Philip R O
Gupta, Aditi
author_sort Oh, Inez Y
collection PubMed
description OBJECTIVES: There is much interest in utilizing clinical data for developing prediction models for Alzheimer’s disease (AD) risk, progression, and outcomes. Existing studies have mostly utilized curated research registries, image analysis, and structured electronic health record (EHR) data. However, much critical information resides in relatively inaccessible unstructured clinical notes within the EHR. MATERIALS AND METHODS: We developed a natural language processing (NLP)-based pipeline to extract AD-related clinical phenotypes, documenting strategies for success and assessing the utility of mining unstructured clinical notes. We evaluated the pipeline against gold-standard manual annotations performed by 2 clinical dementia experts for AD-related clinical phenotypes including medical comorbidities, biomarkers, neurobehavioral test scores, behavioral indicators of cognitive decline, family history, and neuroimaging findings. RESULTS: Documentation rates for each phenotype varied in the structured versus unstructured EHR. Interannotator agreement was high (Cohen’s kappa = 0.72–1) and positively correlated with the NLP-based phenotype extraction pipeline’s performance (average F1-score = 0.65–0.99) for each phenotype. DISCUSSION: We developed an automated NLP-based pipeline to extract informative phenotypes that may improve the performance of eventual machine learning predictive models for AD. In the process, we examined documentation practices for each phenotype relevant to the care of AD patients and identified factors for success. CONCLUSION: Success of our NLP-based phenotype extraction pipeline depended on domain-specific knowledge and focus on a specific clinical domain instead of maximizing generalizability.
format Online
Article
Text
id pubmed-9952043
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-99520432023-02-25 Extraction of clinical phenotypes for Alzheimer’s disease dementia from clinical notes using natural language processing Oh, Inez Y Schindler, Suzanne E Ghoshal, Nupur Lai, Albert M Payne, Philip R O Gupta, Aditi JAMIA Open Research and Applications OBJECTIVES: There is much interest in utilizing clinical data for developing prediction models for Alzheimer’s disease (AD) risk, progression, and outcomes. Existing studies have mostly utilized curated research registries, image analysis, and structured electronic health record (EHR) data. However, much critical information resides in relatively inaccessible unstructured clinical notes within the EHR. MATERIALS AND METHODS: We developed a natural language processing (NLP)-based pipeline to extract AD-related clinical phenotypes, documenting strategies for success and assessing the utility of mining unstructured clinical notes. We evaluated the pipeline against gold-standard manual annotations performed by 2 clinical dementia experts for AD-related clinical phenotypes including medical comorbidities, biomarkers, neurobehavioral test scores, behavioral indicators of cognitive decline, family history, and neuroimaging findings. RESULTS: Documentation rates for each phenotype varied in the structured versus unstructured EHR. Interannotator agreement was high (Cohen’s kappa = 0.72–1) and positively correlated with the NLP-based phenotype extraction pipeline’s performance (average F1-score = 0.65–0.99) for each phenotype. DISCUSSION: We developed an automated NLP-based pipeline to extract informative phenotypes that may improve the performance of eventual machine learning predictive models for AD. In the process, we examined documentation practices for each phenotype relevant to the care of AD patients and identified factors for success. CONCLUSION: Success of our NLP-based phenotype extraction pipeline depended on domain-specific knowledge and focus on a specific clinical domain instead of maximizing generalizability. Oxford University Press 2023-02-24 /pmc/articles/PMC9952043/ /pubmed/36844369 http://dx.doi.org/10.1093/jamiaopen/ooad014 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research and Applications
Oh, Inez Y
Schindler, Suzanne E
Ghoshal, Nupur
Lai, Albert M
Payne, Philip R O
Gupta, Aditi
Extraction of clinical phenotypes for Alzheimer’s disease dementia from clinical notes using natural language processing
title Extraction of clinical phenotypes for Alzheimer’s disease dementia from clinical notes using natural language processing
title_full Extraction of clinical phenotypes for Alzheimer’s disease dementia from clinical notes using natural language processing
title_fullStr Extraction of clinical phenotypes for Alzheimer’s disease dementia from clinical notes using natural language processing
title_full_unstemmed Extraction of clinical phenotypes for Alzheimer’s disease dementia from clinical notes using natural language processing
title_short Extraction of clinical phenotypes for Alzheimer’s disease dementia from clinical notes using natural language processing
title_sort extraction of clinical phenotypes for alzheimer’s disease dementia from clinical notes using natural language processing
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9952043/
https://www.ncbi.nlm.nih.gov/pubmed/36844369
http://dx.doi.org/10.1093/jamiaopen/ooad014
work_keys_str_mv AT ohinezy extractionofclinicalphenotypesforalzheimersdiseasedementiafromclinicalnotesusingnaturallanguageprocessing
AT schindlersuzannee extractionofclinicalphenotypesforalzheimersdiseasedementiafromclinicalnotesusingnaturallanguageprocessing
AT ghoshalnupur extractionofclinicalphenotypesforalzheimersdiseasedementiafromclinicalnotesusingnaturallanguageprocessing
AT laialbertm extractionofclinicalphenotypesforalzheimersdiseasedementiafromclinicalnotesusingnaturallanguageprocessing
AT paynephilipro extractionofclinicalphenotypesforalzheimersdiseasedementiafromclinicalnotesusingnaturallanguageprocessing
AT guptaaditi extractionofclinicalphenotypesforalzheimersdiseasedementiafromclinicalnotesusingnaturallanguageprocessing