Cargando…

Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches

BACKGROUND: The tenth revision of the International Classification of Diseases (ICD-10) is widely used for epidemiological research and health management. The clinical modification (CM) and procedure coding system (PCS) of ICD-10 were developed to describe more clinical details with increasing diagn...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Pei-Fu, Chen, Kuan-Chih, Liao, Wei-Chih, Lai, Feipei, He, Tai-Liang, Lin, Sheng-Che, Chen, Wei-Jen, Yang, Chi-Yu, Lin, Yu-Cheng, Tsai, I-Chang, Chiu, Chi-Hao, Chang, Shu-Chih, Hung, Fang-Ming
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2022
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9282222/ https://www.ncbi.nlm.nih.gov/pubmed/35767353 http://dx.doi.org/10.2196/37557

_version_	1784747059725729792
author	Chen, Pei-Fu Chen, Kuan-Chih Liao, Wei-Chih Lai, Feipei He, Tai-Liang Lin, Sheng-Che Chen, Wei-Jen Yang, Chi-Yu Lin, Yu-Cheng Tsai, I-Chang Chiu, Chi-Hao Chang, Shu-Chih Hung, Fang-Ming
author_facet	Chen, Pei-Fu Chen, Kuan-Chih Liao, Wei-Chih Lai, Feipei He, Tai-Liang Lin, Sheng-Che Chen, Wei-Jen Yang, Chi-Yu Lin, Yu-Cheng Tsai, I-Chang Chiu, Chi-Hao Chang, Shu-Chih Hung, Fang-Ming
author_sort	Chen, Pei-Fu
collection	PubMed
description	BACKGROUND: The tenth revision of the International Classification of Diseases (ICD-10) is widely used for epidemiological research and health management. The clinical modification (CM) and procedure coding system (PCS) of ICD-10 were developed to describe more clinical details with increasing diagnosis and procedure codes and applied in disease-related groups for reimbursement. The expansion of codes made the coding time-consuming and less accurate. The state-of-the-art model using deep contextual word embeddings was used for automatic multilabel text classification of ICD-10. In addition to input discharge diagnoses (DD), the performance can be improved by appropriate preprocessing methods for the text from other document types, such as medical history, comorbidity and complication, surgical method, and special examination. OBJECTIVE: This study aims to establish a contextual language model with rule-based preprocessing methods to develop the model for ICD-10 multilabel classification. METHODS: We retrieved electronic health records from a medical center. We first compared different word embedding methods. Second, we compared the preprocessing methods using the best-performing embeddings. We compared biomedical bidirectional encoder representations from transformers (BioBERT), clinical generalized autoregressive pretraining for language understanding (Clinical XLNet), label tree-based attention-aware deep model for high-performance extreme multilabel text classification (AttentionXLM), and word-to-vector (Word2Vec) to predict ICD-10-CM. To compare different preprocessing methods for ICD-10-CM, we included DD, medical history, and comorbidity and complication as inputs. We compared the performance of ICD-10-CM prediction using different preprocesses, including definition training, external cause code removal, number conversion, and combination code filtering. For the ICD-10 PCS, the model was trained using different combinations of DD, surgical method, and key words of special examination. The micro F(1) score and the micro area under the receiver operating characteristic curve were used to compare the model’s performance with that of different preprocessing methods. RESULTS: BioBERT had an F(1) score of 0.701 and outperformed other models such as Clinical XLNet, AttentionXLM, and Word2Vec. For the ICD-10-CM, the model had an F(1) score that significantly increased from 0.749 (95% CI 0.744-0.753) to 0.769 (95% CI 0.764-0.773) with the ICD-10 definition training, external cause code removal, number conversion, and combination code filter. For the ICD-10-PCS, the model had an F(1) score that significantly increased from 0.670 (95% CI 0.663-0.678) to 0.726 (95% CI 0.719-0.732) with a combination of discharge diagnoses, surgical methods, and key words of special examination. With our preprocessing methods, the model had the highest area under the receiver operating characteristic curve of 0.853 (95% CI 0.849-0.855) and 0.831 (95% CI 0.827-0.834) for ICD-10-CM and ICD-10-PCS, respectively. CONCLUSIONS: The performance of our model with the pretrained contextualized language model and rule-based preprocessing method is better than that of the state-of-the-art model for ICD-10-CM or ICD-10-PCS. This study highlights the importance of rule-based preprocessing methods based on coder coding rules.
format	Online Article Text
id	pubmed-9282222
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-92822222022-07-15 Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches Chen, Pei-Fu Chen, Kuan-Chih Liao, Wei-Chih Lai, Feipei He, Tai-Liang Lin, Sheng-Che Chen, Wei-Jen Yang, Chi-Yu Lin, Yu-Cheng Tsai, I-Chang Chiu, Chi-Hao Chang, Shu-Chih Hung, Fang-Ming JMIR Med Inform Original Paper BACKGROUND: The tenth revision of the International Classification of Diseases (ICD-10) is widely used for epidemiological research and health management. The clinical modification (CM) and procedure coding system (PCS) of ICD-10 were developed to describe more clinical details with increasing diagnosis and procedure codes and applied in disease-related groups for reimbursement. The expansion of codes made the coding time-consuming and less accurate. The state-of-the-art model using deep contextual word embeddings was used for automatic multilabel text classification of ICD-10. In addition to input discharge diagnoses (DD), the performance can be improved by appropriate preprocessing methods for the text from other document types, such as medical history, comorbidity and complication, surgical method, and special examination. OBJECTIVE: This study aims to establish a contextual language model with rule-based preprocessing methods to develop the model for ICD-10 multilabel classification. METHODS: We retrieved electronic health records from a medical center. We first compared different word embedding methods. Second, we compared the preprocessing methods using the best-performing embeddings. We compared biomedical bidirectional encoder representations from transformers (BioBERT), clinical generalized autoregressive pretraining for language understanding (Clinical XLNet), label tree-based attention-aware deep model for high-performance extreme multilabel text classification (AttentionXLM), and word-to-vector (Word2Vec) to predict ICD-10-CM. To compare different preprocessing methods for ICD-10-CM, we included DD, medical history, and comorbidity and complication as inputs. We compared the performance of ICD-10-CM prediction using different preprocesses, including definition training, external cause code removal, number conversion, and combination code filtering. For the ICD-10 PCS, the model was trained using different combinations of DD, surgical method, and key words of special examination. The micro F(1) score and the micro area under the receiver operating characteristic curve were used to compare the model’s performance with that of different preprocessing methods. RESULTS: BioBERT had an F(1) score of 0.701 and outperformed other models such as Clinical XLNet, AttentionXLM, and Word2Vec. For the ICD-10-CM, the model had an F(1) score that significantly increased from 0.749 (95% CI 0.744-0.753) to 0.769 (95% CI 0.764-0.773) with the ICD-10 definition training, external cause code removal, number conversion, and combination code filter. For the ICD-10-PCS, the model had an F(1) score that significantly increased from 0.670 (95% CI 0.663-0.678) to 0.726 (95% CI 0.719-0.732) with a combination of discharge diagnoses, surgical methods, and key words of special examination. With our preprocessing methods, the model had the highest area under the receiver operating characteristic curve of 0.853 (95% CI 0.849-0.855) and 0.831 (95% CI 0.827-0.834) for ICD-10-CM and ICD-10-PCS, respectively. CONCLUSIONS: The performance of our model with the pretrained contextualized language model and rule-based preprocessing method is better than that of the state-of-the-art model for ICD-10-CM or ICD-10-PCS. This study highlights the importance of rule-based preprocessing methods based on coder coding rules. JMIR Publications 2022-06-29 /pmc/articles/PMC9282222/ /pubmed/35767353 http://dx.doi.org/10.2196/37557 Text en ©Pei-Fu Chen, Kuan-Chih Chen, Wei-Chih Liao, Feipei Lai, Tai-Liang He, Sheng-Che Lin, Wei-Jen Chen, Chi-Yu Yang, Yu-Cheng Lin, I-Chang Tsai, Chi-Hao Chiu, Shu-Chih Chang, Fang-Ming Hung. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 29.06.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Chen, Pei-Fu Chen, Kuan-Chih Liao, Wei-Chih Lai, Feipei He, Tai-Liang Lin, Sheng-Che Chen, Wei-Jen Yang, Chi-Yu Lin, Yu-Cheng Tsai, I-Chang Chiu, Chi-Hao Chang, Shu-Chih Hung, Fang-Ming Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches
title	Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches
title_full	Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches
title_fullStr	Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches
title_full_unstemmed	Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches
title_short	Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches
title_sort	automatic international classification of diseases coding system: deep contextualized language model with rule-based approaches
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9282222/ https://www.ncbi.nlm.nih.gov/pubmed/35767353 http://dx.doi.org/10.2196/37557
work_keys_str_mv	AT chenpeifu automaticinternationalclassificationofdiseasescodingsystemdeepcontextualizedlanguagemodelwithrulebasedapproaches AT chenkuanchih automaticinternationalclassificationofdiseasescodingsystemdeepcontextualizedlanguagemodelwithrulebasedapproaches AT liaoweichih automaticinternationalclassificationofdiseasescodingsystemdeepcontextualizedlanguagemodelwithrulebasedapproaches AT laifeipei automaticinternationalclassificationofdiseasescodingsystemdeepcontextualizedlanguagemodelwithrulebasedapproaches AT hetailiang automaticinternationalclassificationofdiseasescodingsystemdeepcontextualizedlanguagemodelwithrulebasedapproaches AT linshengche automaticinternationalclassificationofdiseasescodingsystemdeepcontextualizedlanguagemodelwithrulebasedapproaches AT chenweijen automaticinternationalclassificationofdiseasescodingsystemdeepcontextualizedlanguagemodelwithrulebasedapproaches AT yangchiyu automaticinternationalclassificationofdiseasescodingsystemdeepcontextualizedlanguagemodelwithrulebasedapproaches AT linyucheng automaticinternationalclassificationofdiseasescodingsystemdeepcontextualizedlanguagemodelwithrulebasedapproaches AT tsaiichang automaticinternationalclassificationofdiseasescodingsystemdeepcontextualizedlanguagemodelwithrulebasedapproaches AT chiuchihao automaticinternationalclassificationofdiseasescodingsystemdeepcontextualizedlanguagemodelwithrulebasedapproaches AT changshuchih automaticinternationalclassificationofdiseasescodingsystemdeepcontextualizedlanguagemodelwithrulebasedapproaches AT hungfangming automaticinternationalclassificationofdiseasescodingsystemdeepcontextualizedlanguagemodelwithrulebasedapproaches

Automatic International Classification of Diseases Coding System: Deep Contextualized Language Model With Rule-Based Approaches

Ejemplares similares