Cargando…

Automated ICD coding for coronary heart diseases by a deep learning method

Automated ICD coding via machine learning that focuses on some specific diseases has been a hot topic. As one of the leading causes of death, coronary heart diseases (CHD) have seldom been specifically studied by related research, probably due to lack of data concretely targeting at the diseases. Ba...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Shuai, Diao, Xiaolin, Xia, Yun, Huo, Yanni, Cui, Meng, Wang, Yuxin, Yuan, Jing, Zhao, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10018467/
https://www.ncbi.nlm.nih.gov/pubmed/36938427
http://dx.doi.org/10.1016/j.heliyon.2023.e14037
_version_ 1784907815102447616
author Zhao, Shuai
Diao, Xiaolin
Xia, Yun
Huo, Yanni
Cui, Meng
Wang, Yuxin
Yuan, Jing
Zhao, Wei
author_facet Zhao, Shuai
Diao, Xiaolin
Xia, Yun
Huo, Yanni
Cui, Meng
Wang, Yuxin
Yuan, Jing
Zhao, Wei
author_sort Zhao, Shuai
collection PubMed
description Automated ICD coding via machine learning that focuses on some specific diseases has been a hot topic. As one of the leading causes of death, coronary heart diseases (CHD) have seldom been specifically studied by related research, probably due to lack of data concretely targeting at the diseases. Based on Fuwai-CHD and MIMIC–III–CHD, which are a private dataset from Fuwai Hospital and the CHD-related subset of a public dataset named MIMIC-III respectively, this study aimed at automated CHD coding by a deep learning method, which mainly consists of three modules. The first is a BERT variant module responsible for encoding clinical text. In the module, we fine-tuned BERT variants with masked language model on clinical text, and proposed a truncation method to tackle the problem that BERT variants generally cannot handle sequences containing more than 512 tokens. The second is a word2vec module for encoding code titles and the third is a label-attention module for integrating the embeddings of clinical text and code titles. In short, we named the method BW_att. We compared BW_att against some widely studied baselines, and found that BW_att performed best in most of the coding missions. Specifically, BW_att reached a Macro-F1 of 96.2% and a Macro-AUC of 98.9% for the top-100 most frequent codes in Fuwai-CHD, which covered 89.2% of the total code occurrences. When predicting the top-50 most frequent codes in MIMIC–III–CHD, BW_att reached a Macro-F1 of 40.5% and a Macro-AUC of 66.1%. Moreover, BW_att was capable of locating informative tokens from clinical text for predicting the target codes. In summary, BW_att can not only suggest CHD codes accurately, but also possess robust interpretability, hence has great potential in facilitating CHD coding in practice.
format Online
Article
Text
id pubmed-10018467
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-100184672023-03-17 Automated ICD coding for coronary heart diseases by a deep learning method Zhao, Shuai Diao, Xiaolin Xia, Yun Huo, Yanni Cui, Meng Wang, Yuxin Yuan, Jing Zhao, Wei Heliyon Research Article Automated ICD coding via machine learning that focuses on some specific diseases has been a hot topic. As one of the leading causes of death, coronary heart diseases (CHD) have seldom been specifically studied by related research, probably due to lack of data concretely targeting at the diseases. Based on Fuwai-CHD and MIMIC–III–CHD, which are a private dataset from Fuwai Hospital and the CHD-related subset of a public dataset named MIMIC-III respectively, this study aimed at automated CHD coding by a deep learning method, which mainly consists of three modules. The first is a BERT variant module responsible for encoding clinical text. In the module, we fine-tuned BERT variants with masked language model on clinical text, and proposed a truncation method to tackle the problem that BERT variants generally cannot handle sequences containing more than 512 tokens. The second is a word2vec module for encoding code titles and the third is a label-attention module for integrating the embeddings of clinical text and code titles. In short, we named the method BW_att. We compared BW_att against some widely studied baselines, and found that BW_att performed best in most of the coding missions. Specifically, BW_att reached a Macro-F1 of 96.2% and a Macro-AUC of 98.9% for the top-100 most frequent codes in Fuwai-CHD, which covered 89.2% of the total code occurrences. When predicting the top-50 most frequent codes in MIMIC–III–CHD, BW_att reached a Macro-F1 of 40.5% and a Macro-AUC of 66.1%. Moreover, BW_att was capable of locating informative tokens from clinical text for predicting the target codes. In summary, BW_att can not only suggest CHD codes accurately, but also possess robust interpretability, hence has great potential in facilitating CHD coding in practice. Elsevier 2023-02-27 /pmc/articles/PMC10018467/ /pubmed/36938427 http://dx.doi.org/10.1016/j.heliyon.2023.e14037 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Zhao, Shuai
Diao, Xiaolin
Xia, Yun
Huo, Yanni
Cui, Meng
Wang, Yuxin
Yuan, Jing
Zhao, Wei
Automated ICD coding for coronary heart diseases by a deep learning method
title Automated ICD coding for coronary heart diseases by a deep learning method
title_full Automated ICD coding for coronary heart diseases by a deep learning method
title_fullStr Automated ICD coding for coronary heart diseases by a deep learning method
title_full_unstemmed Automated ICD coding for coronary heart diseases by a deep learning method
title_short Automated ICD coding for coronary heart diseases by a deep learning method
title_sort automated icd coding for coronary heart diseases by a deep learning method
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10018467/
https://www.ncbi.nlm.nih.gov/pubmed/36938427
http://dx.doi.org/10.1016/j.heliyon.2023.e14037
work_keys_str_mv AT zhaoshuai automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod
AT diaoxiaolin automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod
AT xiayun automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod
AT huoyanni automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod
AT cuimeng automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod
AT wangyuxin automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod
AT yuanjing automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod
AT zhaowei automatedicdcodingforcoronaryheartdiseasesbyadeeplearningmethod