Cargando…

Classifying the lifestyle status for Alzheimer’s disease from clinical notes using deep learning with weak supervision

BACKGROUND: Since no effective therapies exist for Alzheimer’s disease (AD), prevention has become more critical through lifestyle status changes and interventions. Analyzing electronic health records (EHRs) of patients with AD can help us better understand lifestyle’s effect on AD. However, lifesty...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shen, Zitao, Schutte, Dalton, Yi, Yoonkwon, Bompelli, Anusha, Yu, Fang, Wang, Yanshan, Zhang, Rui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9261217/ https://www.ncbi.nlm.nih.gov/pubmed/35799294 http://dx.doi.org/10.1186/s12911-022-01819-4

_version_	1784742224064413696
author	Shen, Zitao Schutte, Dalton Yi, Yoonkwon Bompelli, Anusha Yu, Fang Wang, Yanshan Zhang, Rui
author_facet	Shen, Zitao Schutte, Dalton Yi, Yoonkwon Bompelli, Anusha Yu, Fang Wang, Yanshan Zhang, Rui
author_sort	Shen, Zitao
collection	PubMed
description	BACKGROUND: Since no effective therapies exist for Alzheimer’s disease (AD), prevention has become more critical through lifestyle status changes and interventions. Analyzing electronic health records (EHRs) of patients with AD can help us better understand lifestyle’s effect on AD. However, lifestyle information is typically stored in clinical narratives. Thus, the objective of the study was to compare different natural language processing (NLP) models on classifying the lifestyle statuses (e.g., physical activity and excessive diet) from clinical texts in English. METHODS: Based on the collected concept unique identifiers (CUIs) associated with the lifestyle status, we extracted all related EHRs for patients with AD from the Clinical Data Repository (CDR) of the University of Minnesota (UMN). We automatically generated labels for the training data by using a rule-based NLP algorithm. We conducted weak supervision for pre-trained Bidirectional Encoder Representations from Transformers (BERT) models and three traditional machine learning models as baseline models on the weakly labeled training corpus. These models include the BERT base model, PubMedBERT (abstracts + full text), PubMedBERT (only abstracts), Unified Medical Language System (UMLS) BERT, Bio BERT, Bio-clinical BERT, logistic regression, support vector machine, and random forest. The rule-based model used for weak supervision was tested on the GSC for comparison. We performed two case studies: physical activity and excessive diet, in order to validate the effectiveness of BERT models in classifying lifestyle status for all models were evaluated and compared on the developed Gold Standard Corpus (GSC) on the two case studies. RESULTS: The UMLS BERT model achieved the best performance for classifying status of physical activity, with its precision, recall, and F-1 scores of 0.93, 0.93, and 0.92, respectively. Regarding classifying excessive diet, the Bio-clinical BERT model showed the best performance with precision, recall, and F-1 scores of 0.93, 0.93, and 0.93, respectively. CONCLUSION: The proposed approach leveraging weak supervision could significantly increase the sample size, which is required for training the deep learning models. By comparing with the traditional machine learning models, the study also demonstrates the high performance of BERT models for classifying lifestyle status for Alzheimer’s disease in clinical notes.
format	Online Article Text
id	pubmed-9261217
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-92612172022-07-07 Classifying the lifestyle status for Alzheimer’s disease from clinical notes using deep learning with weak supervision Shen, Zitao Schutte, Dalton Yi, Yoonkwon Bompelli, Anusha Yu, Fang Wang, Yanshan Zhang, Rui BMC Med Inform Decis Mak Research BACKGROUND: Since no effective therapies exist for Alzheimer’s disease (AD), prevention has become more critical through lifestyle status changes and interventions. Analyzing electronic health records (EHRs) of patients with AD can help us better understand lifestyle’s effect on AD. However, lifestyle information is typically stored in clinical narratives. Thus, the objective of the study was to compare different natural language processing (NLP) models on classifying the lifestyle statuses (e.g., physical activity and excessive diet) from clinical texts in English. METHODS: Based on the collected concept unique identifiers (CUIs) associated with the lifestyle status, we extracted all related EHRs for patients with AD from the Clinical Data Repository (CDR) of the University of Minnesota (UMN). We automatically generated labels for the training data by using a rule-based NLP algorithm. We conducted weak supervision for pre-trained Bidirectional Encoder Representations from Transformers (BERT) models and three traditional machine learning models as baseline models on the weakly labeled training corpus. These models include the BERT base model, PubMedBERT (abstracts + full text), PubMedBERT (only abstracts), Unified Medical Language System (UMLS) BERT, Bio BERT, Bio-clinical BERT, logistic regression, support vector machine, and random forest. The rule-based model used for weak supervision was tested on the GSC for comparison. We performed two case studies: physical activity and excessive diet, in order to validate the effectiveness of BERT models in classifying lifestyle status for all models were evaluated and compared on the developed Gold Standard Corpus (GSC) on the two case studies. RESULTS: The UMLS BERT model achieved the best performance for classifying status of physical activity, with its precision, recall, and F-1 scores of 0.93, 0.93, and 0.92, respectively. Regarding classifying excessive diet, the Bio-clinical BERT model showed the best performance with precision, recall, and F-1 scores of 0.93, 0.93, and 0.93, respectively. CONCLUSION: The proposed approach leveraging weak supervision could significantly increase the sample size, which is required for training the deep learning models. By comparing with the traditional machine learning models, the study also demonstrates the high performance of BERT models for classifying lifestyle status for Alzheimer’s disease in clinical notes. BioMed Central 2022-07-07 /pmc/articles/PMC9261217/ /pubmed/35799294 http://dx.doi.org/10.1186/s12911-022-01819-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Shen, Zitao Schutte, Dalton Yi, Yoonkwon Bompelli, Anusha Yu, Fang Wang, Yanshan Zhang, Rui Classifying the lifestyle status for Alzheimer’s disease from clinical notes using deep learning with weak supervision
title	Classifying the lifestyle status for Alzheimer’s disease from clinical notes using deep learning with weak supervision
title_full	Classifying the lifestyle status for Alzheimer’s disease from clinical notes using deep learning with weak supervision
title_fullStr	Classifying the lifestyle status for Alzheimer’s disease from clinical notes using deep learning with weak supervision
title_full_unstemmed	Classifying the lifestyle status for Alzheimer’s disease from clinical notes using deep learning with weak supervision
title_short	Classifying the lifestyle status for Alzheimer’s disease from clinical notes using deep learning with weak supervision
title_sort	classifying the lifestyle status for alzheimer’s disease from clinical notes using deep learning with weak supervision
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9261217/ https://www.ncbi.nlm.nih.gov/pubmed/35799294 http://dx.doi.org/10.1186/s12911-022-01819-4
work_keys_str_mv	AT shenzitao classifyingthelifestylestatusforalzheimersdiseasefromclinicalnotesusingdeeplearningwithweaksupervision AT schuttedalton classifyingthelifestylestatusforalzheimersdiseasefromclinicalnotesusingdeeplearningwithweaksupervision AT yiyoonkwon classifyingthelifestylestatusforalzheimersdiseasefromclinicalnotesusingdeeplearningwithweaksupervision AT bompellianusha classifyingthelifestylestatusforalzheimersdiseasefromclinicalnotesusingdeeplearningwithweaksupervision AT yufang classifyingthelifestylestatusforalzheimersdiseasefromclinicalnotesusingdeeplearningwithweaksupervision AT wangyanshan classifyingthelifestylestatusforalzheimersdiseasefromclinicalnotesusingdeeplearningwithweaksupervision AT zhangrui classifyingthelifestylestatusforalzheimersdiseasefromclinicalnotesusingdeeplearningwithweaksupervision

Classifying the lifestyle status for Alzheimer’s disease from clinical notes using deep learning with weak supervision

Ejemplares similares