Cargando…

Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment

BACKGROUND: The recognition of medical entities from natural language is a ubiquitous problem in the medical field, with applications ranging from medical coding to the analysis of electronic health data for public health. It is, however, a complex task usually requiring human expert intervention, t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Falissard, Louis, Morgand, Claire, Ghosn, Walid, Imbaud, Claire, Bounebache, Karim, Rey, Grégoire
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2022
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9039820/ https://www.ncbi.nlm.nih.gov/pubmed/35404262 http://dx.doi.org/10.2196/26353

_version_	1784694213438341120
author	Falissard, Louis Morgand, Claire Ghosn, Walid Imbaud, Claire Bounebache, Karim Rey, Grégoire
author_facet	Falissard, Louis Morgand, Claire Ghosn, Walid Imbaud, Claire Bounebache, Karim Rey, Grégoire
author_sort	Falissard, Louis
collection	PubMed
description	BACKGROUND: The recognition of medical entities from natural language is a ubiquitous problem in the medical field, with applications ranging from medical coding to the analysis of electronic health data for public health. It is, however, a complex task usually requiring human expert intervention, thus making it expansive and time-consuming. Recent advances in artificial intelligence, specifically the rise of deep learning methods, have enabled computers to make efficient decisions on a number of complex problems, with the notable example of neural sequence models and their powerful applications in natural language processing. However, they require a considerable amount of data to learn from, which is typically their main limiting factor. The Centre for Epidemiology on Medical Causes of Death (CépiDc) stores an exhaustive database of death certificates at the French national scale, amounting to several millions of natural language examples provided with their associated human-coded medical entities available to the machine learning practitioner. OBJECTIVE: The aim of this paper was to investigate the application of deep neural sequence models to the problem of medical entity recognition from natural language. METHODS: The investigated data set included every French death certificate from 2011 to 2016. These certificates contain information such as the subject’s age, the subject’s gender, and the chain of events leading to his or her death, both in French and encoded as International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) medical entities, for a total of around 3 million observations in the data set. The task of automatically recognizing ICD-10 medical entities from the French natural language–based chain of events leading to death was then formulated as a type of predictive modeling problem known as a sequence-to-sequence modeling problem. A deep neural network–based model, known as the Transformer, was then slightly adapted and fit to the data set. Its performance was then assessed on an external data set and compared to the current state-of-the-art approach. CIs for derived measurements were estimated via bootstrapping. RESULTS: The proposed approach resulted in an F-measure value of 0.952 (95% CI 0.946-0.957), which constitutes a significant improvement over the current state-of-the-art approach and its previously reported F-measure value of 0.825 as assessed on a comparable data set. Such an improvement makes possible a whole field of new applications, from nosologist-level automated coding to temporal harmonization of death statistics. CONCLUSIONS: This paper shows that a deep artificial neural network can directly learn from voluminous data sets in order to identify complex relationships between natural language and medical entities, without any explicit prior knowledge. Although not entirely free from mistakes, the derived model constitutes a powerful tool for automated coding of medical entities from medical language with promising potential applications.
format	Online Article Text
id	pubmed-9039820
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-90398202022-04-27 Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment Falissard, Louis Morgand, Claire Ghosn, Walid Imbaud, Claire Bounebache, Karim Rey, Grégoire JMIR Med Inform Original Paper BACKGROUND: The recognition of medical entities from natural language is a ubiquitous problem in the medical field, with applications ranging from medical coding to the analysis of electronic health data for public health. It is, however, a complex task usually requiring human expert intervention, thus making it expansive and time-consuming. Recent advances in artificial intelligence, specifically the rise of deep learning methods, have enabled computers to make efficient decisions on a number of complex problems, with the notable example of neural sequence models and their powerful applications in natural language processing. However, they require a considerable amount of data to learn from, which is typically their main limiting factor. The Centre for Epidemiology on Medical Causes of Death (CépiDc) stores an exhaustive database of death certificates at the French national scale, amounting to several millions of natural language examples provided with their associated human-coded medical entities available to the machine learning practitioner. OBJECTIVE: The aim of this paper was to investigate the application of deep neural sequence models to the problem of medical entity recognition from natural language. METHODS: The investigated data set included every French death certificate from 2011 to 2016. These certificates contain information such as the subject’s age, the subject’s gender, and the chain of events leading to his or her death, both in French and encoded as International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) medical entities, for a total of around 3 million observations in the data set. The task of automatically recognizing ICD-10 medical entities from the French natural language–based chain of events leading to death was then formulated as a type of predictive modeling problem known as a sequence-to-sequence modeling problem. A deep neural network–based model, known as the Transformer, was then slightly adapted and fit to the data set. Its performance was then assessed on an external data set and compared to the current state-of-the-art approach. CIs for derived measurements were estimated via bootstrapping. RESULTS: The proposed approach resulted in an F-measure value of 0.952 (95% CI 0.946-0.957), which constitutes a significant improvement over the current state-of-the-art approach and its previously reported F-measure value of 0.825 as assessed on a comparable data set. Such an improvement makes possible a whole field of new applications, from nosologist-level automated coding to temporal harmonization of death statistics. CONCLUSIONS: This paper shows that a deep artificial neural network can directly learn from voluminous data sets in order to identify complex relationships between natural language and medical entities, without any explicit prior knowledge. Although not entirely free from mistakes, the derived model constitutes a powerful tool for automated coding of medical entities from medical language with promising potential applications. JMIR Publications 2022-04-11 /pmc/articles/PMC9039820/ /pubmed/35404262 http://dx.doi.org/10.2196/26353 Text en ©Louis Falissard, Claire Morgand, Walid Ghosn, Claire Imbaud, Karim Bounebache, Grégoire Rey. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 11.04.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Falissard, Louis Morgand, Claire Ghosn, Walid Imbaud, Claire Bounebache, Karim Rey, Grégoire Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment
title	Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment
title_full	Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment
title_fullStr	Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment
title_full_unstemmed	Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment
title_short	Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment
title_sort	neural translation and automated recognition of icd-10 medical entities from natural language: model development and performance assessment
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9039820/ https://www.ncbi.nlm.nih.gov/pubmed/35404262 http://dx.doi.org/10.2196/26353
work_keys_str_mv	AT falissardlouis neuraltranslationandautomatedrecognitionoficd10medicalentitiesfromnaturallanguagemodeldevelopmentandperformanceassessment AT morgandclaire neuraltranslationandautomatedrecognitionoficd10medicalentitiesfromnaturallanguagemodeldevelopmentandperformanceassessment AT ghosnwalid neuraltranslationandautomatedrecognitionoficd10medicalentitiesfromnaturallanguagemodeldevelopmentandperformanceassessment AT imbaudclaire neuraltranslationandautomatedrecognitionoficd10medicalentitiesfromnaturallanguagemodeldevelopmentandperformanceassessment AT bounebachekarim neuraltranslationandautomatedrecognitionoficd10medicalentitiesfromnaturallanguagemodeldevelopmentandperformanceassessment AT reygregoire neuraltranslationandautomatedrecognitionoficd10medicalentitiesfromnaturallanguagemodeldevelopmentandperformanceassessment

Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment

Ejemplares similares