Cargando…

Training a Deep Contextualized Language Model for International Classification of Diseases, 10th Revision Classification via Federated Learning: Model Development and Validation Study

BACKGROUND: The automatic coding of clinical text documents by using the International Classification of Diseases, 10th Revision (ICD-10) can be performed for statistical analyses and reimbursements. With the development of natural language processing models, new transformer architectures with atten...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Pei-Fu, He, Tai-Liang, Lin, Sheng-Che, Chu, Yuan-Chia, Kuo, Chen-Tsung, Lai, Feipei, Wang, Ssu-Ming, Zhu, Wan-Xuan, Chen, Kuan-Chih, Kuo, Lu-Cheng, Hung, Fang-Ming, Lin, Yu-Cheng, Tsai, I-Chang, Chiu, Chi-Hao, Chang, Shu-Chih, Yang, Chi-Yu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2022
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9693720/ https://www.ncbi.nlm.nih.gov/pubmed/36355417 http://dx.doi.org/10.2196/41342

_version_	1784837614676738048
author	Chen, Pei-Fu He, Tai-Liang Lin, Sheng-Che Chu, Yuan-Chia Kuo, Chen-Tsung Lai, Feipei Wang, Ssu-Ming Zhu, Wan-Xuan Chen, Kuan-Chih Kuo, Lu-Cheng Hung, Fang-Ming Lin, Yu-Cheng Tsai, I-Chang Chiu, Chi-Hao Chang, Shu-Chih Yang, Chi-Yu
author_facet	Chen, Pei-Fu He, Tai-Liang Lin, Sheng-Che Chu, Yuan-Chia Kuo, Chen-Tsung Lai, Feipei Wang, Ssu-Ming Zhu, Wan-Xuan Chen, Kuan-Chih Kuo, Lu-Cheng Hung, Fang-Ming Lin, Yu-Cheng Tsai, I-Chang Chiu, Chi-Hao Chang, Shu-Chih Yang, Chi-Yu
author_sort	Chen, Pei-Fu
collection	PubMed
description	BACKGROUND: The automatic coding of clinical text documents by using the International Classification of Diseases, 10th Revision (ICD-10) can be performed for statistical analyses and reimbursements. With the development of natural language processing models, new transformer architectures with attention mechanisms have outperformed previous models. Although multicenter training may increase a model’s performance and external validity, the privacy of clinical documents should be protected. We used federated learning to train a model with multicenter data, without sharing data per se. OBJECTIVE: This study aims to train a classification model via federated learning for ICD-10 multilabel classification. METHODS: Text data from discharge notes in electronic medical records were collected from the following three medical centers: Far Eastern Memorial Hospital, National Taiwan University Hospital, and Taipei Veterans General Hospital. After comparing the performance of different variants of bidirectional encoder representations from transformers (BERT), PubMedBERT was chosen for the word embeddings. With regard to preprocessing, the nonalphanumeric characters were retained because the model’s performance decreased after the removal of these characters. To explain the outputs of our model, we added a label attention mechanism to the model architecture. The model was trained with data from each of the three hospitals separately and via federated learning. The models trained via federated learning and the models trained with local data were compared on a testing set that was composed of data from the three hospitals. The micro F(1) score was used to evaluate model performance across all 3 centers. RESULTS: The F(1) scores of PubMedBERT, RoBERTa (Robustly Optimized BERT Pretraining Approach), ClinicalBERT, and BioBERT (BERT for Biomedical Text Mining) were 0.735, 0.692, 0.711, and 0.721, respectively. The F(1) score of the model that retained nonalphanumeric characters was 0.8120, whereas the F(1) score after removing these characters was 0.7875—a decrease of 0.0245 (3.11%). The F(1) scores on the testing set were 0.6142, 0.4472, 0.5353, and 0.2522 for the federated learning, Far Eastern Memorial Hospital, National Taiwan University Hospital, and Taipei Veterans General Hospital models, respectively. The explainable predictions were displayed with highlighted input words via the label attention architecture. CONCLUSIONS: Federated learning was used to train the ICD-10 classification model on multicenter clinical text while protecting data privacy. The model’s performance was better than that of models that were trained locally.
format	Online Article Text
id	pubmed-9693720
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-96937202022-11-26 Training a Deep Contextualized Language Model for International Classification of Diseases, 10th Revision Classification via Federated Learning: Model Development and Validation Study Chen, Pei-Fu He, Tai-Liang Lin, Sheng-Che Chu, Yuan-Chia Kuo, Chen-Tsung Lai, Feipei Wang, Ssu-Ming Zhu, Wan-Xuan Chen, Kuan-Chih Kuo, Lu-Cheng Hung, Fang-Ming Lin, Yu-Cheng Tsai, I-Chang Chiu, Chi-Hao Chang, Shu-Chih Yang, Chi-Yu JMIR Med Inform Original Paper BACKGROUND: The automatic coding of clinical text documents by using the International Classification of Diseases, 10th Revision (ICD-10) can be performed for statistical analyses and reimbursements. With the development of natural language processing models, new transformer architectures with attention mechanisms have outperformed previous models. Although multicenter training may increase a model’s performance and external validity, the privacy of clinical documents should be protected. We used federated learning to train a model with multicenter data, without sharing data per se. OBJECTIVE: This study aims to train a classification model via federated learning for ICD-10 multilabel classification. METHODS: Text data from discharge notes in electronic medical records were collected from the following three medical centers: Far Eastern Memorial Hospital, National Taiwan University Hospital, and Taipei Veterans General Hospital. After comparing the performance of different variants of bidirectional encoder representations from transformers (BERT), PubMedBERT was chosen for the word embeddings. With regard to preprocessing, the nonalphanumeric characters were retained because the model’s performance decreased after the removal of these characters. To explain the outputs of our model, we added a label attention mechanism to the model architecture. The model was trained with data from each of the three hospitals separately and via federated learning. The models trained via federated learning and the models trained with local data were compared on a testing set that was composed of data from the three hospitals. The micro F(1) score was used to evaluate model performance across all 3 centers. RESULTS: The F(1) scores of PubMedBERT, RoBERTa (Robustly Optimized BERT Pretraining Approach), ClinicalBERT, and BioBERT (BERT for Biomedical Text Mining) were 0.735, 0.692, 0.711, and 0.721, respectively. The F(1) score of the model that retained nonalphanumeric characters was 0.8120, whereas the F(1) score after removing these characters was 0.7875—a decrease of 0.0245 (3.11%). The F(1) scores on the testing set were 0.6142, 0.4472, 0.5353, and 0.2522 for the federated learning, Far Eastern Memorial Hospital, National Taiwan University Hospital, and Taipei Veterans General Hospital models, respectively. The explainable predictions were displayed with highlighted input words via the label attention architecture. CONCLUSIONS: Federated learning was used to train the ICD-10 classification model on multicenter clinical text while protecting data privacy. The model’s performance was better than that of models that were trained locally. JMIR Publications 2022-11-10 /pmc/articles/PMC9693720/ /pubmed/36355417 http://dx.doi.org/10.2196/41342 Text en ©Pei-Fu Chen, Tai-Liang He, Sheng-Che Lin, Yuan-Chia Chu, Chen-Tsung Kuo, Feipei Lai, Ssu-Ming Wang, Wan-Xuan Zhu, Kuan-Chih Chen, Lu-Cheng Kuo, Fang-Ming Hung, Yu-Cheng Lin, I-Chang Tsai, Chi-Hao Chiu, Shu-Chih Chang, Chi-Yu Yang. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 10.11.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Chen, Pei-Fu He, Tai-Liang Lin, Sheng-Che Chu, Yuan-Chia Kuo, Chen-Tsung Lai, Feipei Wang, Ssu-Ming Zhu, Wan-Xuan Chen, Kuan-Chih Kuo, Lu-Cheng Hung, Fang-Ming Lin, Yu-Cheng Tsai, I-Chang Chiu, Chi-Hao Chang, Shu-Chih Yang, Chi-Yu Training a Deep Contextualized Language Model for International Classification of Diseases, 10th Revision Classification via Federated Learning: Model Development and Validation Study
title	Training a Deep Contextualized Language Model for International Classification of Diseases, 10th Revision Classification via Federated Learning: Model Development and Validation Study
title_full	Training a Deep Contextualized Language Model for International Classification of Diseases, 10th Revision Classification via Federated Learning: Model Development and Validation Study
title_fullStr	Training a Deep Contextualized Language Model for International Classification of Diseases, 10th Revision Classification via Federated Learning: Model Development and Validation Study
title_full_unstemmed	Training a Deep Contextualized Language Model for International Classification of Diseases, 10th Revision Classification via Federated Learning: Model Development and Validation Study
title_short	Training a Deep Contextualized Language Model for International Classification of Diseases, 10th Revision Classification via Federated Learning: Model Development and Validation Study
title_sort	training a deep contextualized language model for international classification of diseases, 10th revision classification via federated learning: model development and validation study
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9693720/ https://www.ncbi.nlm.nih.gov/pubmed/36355417 http://dx.doi.org/10.2196/41342
work_keys_str_mv	AT chenpeifu trainingadeepcontextualizedlanguagemodelforinternationalclassificationofdiseases10threvisionclassificationviafederatedlearningmodeldevelopmentandvalidationstudy AT hetailiang trainingadeepcontextualizedlanguagemodelforinternationalclassificationofdiseases10threvisionclassificationviafederatedlearningmodeldevelopmentandvalidationstudy AT linshengche trainingadeepcontextualizedlanguagemodelforinternationalclassificationofdiseases10threvisionclassificationviafederatedlearningmodeldevelopmentandvalidationstudy AT chuyuanchia trainingadeepcontextualizedlanguagemodelforinternationalclassificationofdiseases10threvisionclassificationviafederatedlearningmodeldevelopmentandvalidationstudy AT kuochentsung trainingadeepcontextualizedlanguagemodelforinternationalclassificationofdiseases10threvisionclassificationviafederatedlearningmodeldevelopmentandvalidationstudy AT laifeipei trainingadeepcontextualizedlanguagemodelforinternationalclassificationofdiseases10threvisionclassificationviafederatedlearningmodeldevelopmentandvalidationstudy AT wangssuming trainingadeepcontextualizedlanguagemodelforinternationalclassificationofdiseases10threvisionclassificationviafederatedlearningmodeldevelopmentandvalidationstudy AT zhuwanxuan trainingadeepcontextualizedlanguagemodelforinternationalclassificationofdiseases10threvisionclassificationviafederatedlearningmodeldevelopmentandvalidationstudy AT chenkuanchih trainingadeepcontextualizedlanguagemodelforinternationalclassificationofdiseases10threvisionclassificationviafederatedlearningmodeldevelopmentandvalidationstudy AT kuolucheng trainingadeepcontextualizedlanguagemodelforinternationalclassificationofdiseases10threvisionclassificationviafederatedlearningmodeldevelopmentandvalidationstudy AT hungfangming trainingadeepcontextualizedlanguagemodelforinternationalclassificationofdiseases10threvisionclassificationviafederatedlearningmodeldevelopmentandvalidationstudy AT linyucheng trainingadeepcontextualizedlanguagemodelforinternationalclassificationofdiseases10threvisionclassificationviafederatedlearningmodeldevelopmentandvalidationstudy AT tsaiichang trainingadeepcontextualizedlanguagemodelforinternationalclassificationofdiseases10threvisionclassificationviafederatedlearningmodeldevelopmentandvalidationstudy AT chiuchihao trainingadeepcontextualizedlanguagemodelforinternationalclassificationofdiseases10threvisionclassificationviafederatedlearningmodeldevelopmentandvalidationstudy AT changshuchih trainingadeepcontextualizedlanguagemodelforinternationalclassificationofdiseases10threvisionclassificationviafederatedlearningmodeldevelopmentandvalidationstudy AT yangchiyu trainingadeepcontextualizedlanguagemodelforinternationalclassificationofdiseases10threvisionclassificationviafederatedlearningmodeldevelopmentandvalidationstudy

Training a Deep Contextualized Language Model for International Classification of Diseases, 10th Revision Classification via Federated Learning: Model Development and Validation Study

Ejemplares similares