Cargando…

A hierarchical method to automatically encode Chinese diagnoses through semantic similarity estimation

BACKGROUND: The accumulation of medical documents in China has rapidly increased in the past years. We focus on developing a method that automatically performs ICD-10 code assignment to Chinese diagnoses from the electronic medical records to support the medical coding process in Chinese hospitals....

Descripción completa

Detalles Bibliográficos
Autores principales: Ning, Wenxin, Yu, Ming, Zhang, Runtong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4778321/
https://www.ncbi.nlm.nih.gov/pubmed/26940992
http://dx.doi.org/10.1186/s12911-016-0269-4
_version_ 1782419443822886912
author Ning, Wenxin
Yu, Ming
Zhang, Runtong
author_facet Ning, Wenxin
Yu, Ming
Zhang, Runtong
author_sort Ning, Wenxin
collection PubMed
description BACKGROUND: The accumulation of medical documents in China has rapidly increased in the past years. We focus on developing a method that automatically performs ICD-10 code assignment to Chinese diagnoses from the electronic medical records to support the medical coding process in Chinese hospitals. METHODS: We propose two encoding methods: one that directly determines the desired code (flat method), and one that hierarchically determines the most suitable code until the desired code is obtained (hierarchical method). Both methods are based on instances from the standard diagnostic library, a gold standard dataset in China. For the first time, semantic similarity estimation between Chinese words are applied in the biomedical domain with the successful implementation of knowledge-based and distributional approaches. Characteristics of the Chinese language are considered in implementing distributional semantics. We test our methods against 16,330 coding instances from our partner hospital. RESULTS: The hierarchical method outperforms the flat method in terms of accuracy and time complexity. Representing distributional semantics using Chinese characters can achieve comparable performance to the use of Chinese words. The diagnoses in the test set can be encoded automatically with micro-averaged precision of 92.57 %, recall of 89.63 %, and F-score of 91.08 %. A sharp decrease in encoding performance is observed without semantic similarity estimation. CONCLUSION: The hierarchical nature of ICD-10 codes can enhance the performance of the automated code assignment. Semantic similarity estimation is demonstrated indispensable in dealing with Chinese medical text. The proposed method can greatly reduce the workload and improve the efficiency of the code assignment process in Chinese hospitals. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12911-016-0269-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4778321
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47783212016-03-05 A hierarchical method to automatically encode Chinese diagnoses through semantic similarity estimation Ning, Wenxin Yu, Ming Zhang, Runtong BMC Med Inform Decis Mak Research Article BACKGROUND: The accumulation of medical documents in China has rapidly increased in the past years. We focus on developing a method that automatically performs ICD-10 code assignment to Chinese diagnoses from the electronic medical records to support the medical coding process in Chinese hospitals. METHODS: We propose two encoding methods: one that directly determines the desired code (flat method), and one that hierarchically determines the most suitable code until the desired code is obtained (hierarchical method). Both methods are based on instances from the standard diagnostic library, a gold standard dataset in China. For the first time, semantic similarity estimation between Chinese words are applied in the biomedical domain with the successful implementation of knowledge-based and distributional approaches. Characteristics of the Chinese language are considered in implementing distributional semantics. We test our methods against 16,330 coding instances from our partner hospital. RESULTS: The hierarchical method outperforms the flat method in terms of accuracy and time complexity. Representing distributional semantics using Chinese characters can achieve comparable performance to the use of Chinese words. The diagnoses in the test set can be encoded automatically with micro-averaged precision of 92.57 %, recall of 89.63 %, and F-score of 91.08 %. A sharp decrease in encoding performance is observed without semantic similarity estimation. CONCLUSION: The hierarchical nature of ICD-10 codes can enhance the performance of the automated code assignment. Semantic similarity estimation is demonstrated indispensable in dealing with Chinese medical text. The proposed method can greatly reduce the workload and improve the efficiency of the code assignment process in Chinese hospitals. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12911-016-0269-4) contains supplementary material, which is available to authorized users. BioMed Central 2016-03-03 /pmc/articles/PMC4778321/ /pubmed/26940992 http://dx.doi.org/10.1186/s12911-016-0269-4 Text en © Ning et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Ning, Wenxin
Yu, Ming
Zhang, Runtong
A hierarchical method to automatically encode Chinese diagnoses through semantic similarity estimation
title A hierarchical method to automatically encode Chinese diagnoses through semantic similarity estimation
title_full A hierarchical method to automatically encode Chinese diagnoses through semantic similarity estimation
title_fullStr A hierarchical method to automatically encode Chinese diagnoses through semantic similarity estimation
title_full_unstemmed A hierarchical method to automatically encode Chinese diagnoses through semantic similarity estimation
title_short A hierarchical method to automatically encode Chinese diagnoses through semantic similarity estimation
title_sort hierarchical method to automatically encode chinese diagnoses through semantic similarity estimation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4778321/
https://www.ncbi.nlm.nih.gov/pubmed/26940992
http://dx.doi.org/10.1186/s12911-016-0269-4
work_keys_str_mv AT ningwenxin ahierarchicalmethodtoautomaticallyencodechinesediagnosesthroughsemanticsimilarityestimation
AT yuming ahierarchicalmethodtoautomaticallyencodechinesediagnosesthroughsemanticsimilarityestimation
AT zhangruntong ahierarchicalmethodtoautomaticallyencodechinesediagnosesthroughsemanticsimilarityestimation
AT ningwenxin hierarchicalmethodtoautomaticallyencodechinesediagnosesthroughsemanticsimilarityestimation
AT yuming hierarchicalmethodtoautomaticallyencodechinesediagnosesthroughsemanticsimilarityestimation
AT zhangruntong hierarchicalmethodtoautomaticallyencodechinesediagnosesthroughsemanticsimilarityestimation