Cargando…

Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment

BACKGROUND: The recognition of medical entities from natural language is a ubiquitous problem in the medical field, with applications ranging from medical coding to the analysis of electronic health data for public health. It is, however, a complex task usually requiring human expert intervention, t...

Descripción completa

Detalles Bibliográficos
Autores principales: Falissard, Louis, Morgand, Claire, Ghosn, Walid, Imbaud, Claire, Bounebache, Karim, Rey, Grégoire
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9039820/
https://www.ncbi.nlm.nih.gov/pubmed/35404262
http://dx.doi.org/10.2196/26353
_version_ 1784694213438341120
author Falissard, Louis
Morgand, Claire
Ghosn, Walid
Imbaud, Claire
Bounebache, Karim
Rey, Grégoire
author_facet Falissard, Louis
Morgand, Claire
Ghosn, Walid
Imbaud, Claire
Bounebache, Karim
Rey, Grégoire
author_sort Falissard, Louis
collection PubMed
description BACKGROUND: The recognition of medical entities from natural language is a ubiquitous problem in the medical field, with applications ranging from medical coding to the analysis of electronic health data for public health. It is, however, a complex task usually requiring human expert intervention, thus making it expansive and time-consuming. Recent advances in artificial intelligence, specifically the rise of deep learning methods, have enabled computers to make efficient decisions on a number of complex problems, with the notable example of neural sequence models and their powerful applications in natural language processing. However, they require a considerable amount of data to learn from, which is typically their main limiting factor. The Centre for Epidemiology on Medical Causes of Death (CépiDc) stores an exhaustive database of death certificates at the French national scale, amounting to several millions of natural language examples provided with their associated human-coded medical entities available to the machine learning practitioner. OBJECTIVE: The aim of this paper was to investigate the application of deep neural sequence models to the problem of medical entity recognition from natural language. METHODS: The investigated data set included every French death certificate from 2011 to 2016. These certificates contain information such as the subject’s age, the subject’s gender, and the chain of events leading to his or her death, both in French and encoded as International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) medical entities, for a total of around 3 million observations in the data set. The task of automatically recognizing ICD-10 medical entities from the French natural language–based chain of events leading to death was then formulated as a type of predictive modeling problem known as a sequence-to-sequence modeling problem. A deep neural network–based model, known as the Transformer, was then slightly adapted and fit to the data set. Its performance was then assessed on an external data set and compared to the current state-of-the-art approach. CIs for derived measurements were estimated via bootstrapping. RESULTS: The proposed approach resulted in an F-measure value of 0.952 (95% CI 0.946-0.957), which constitutes a significant improvement over the current state-of-the-art approach and its previously reported F-measure value of 0.825 as assessed on a comparable data set. Such an improvement makes possible a whole field of new applications, from nosologist-level automated coding to temporal harmonization of death statistics. CONCLUSIONS: This paper shows that a deep artificial neural network can directly learn from voluminous data sets in order to identify complex relationships between natural language and medical entities, without any explicit prior knowledge. Although not entirely free from mistakes, the derived model constitutes a powerful tool for automated coding of medical entities from medical language with promising potential applications.
format Online
Article
Text
id pubmed-9039820
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-90398202022-04-27 Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment Falissard, Louis Morgand, Claire Ghosn, Walid Imbaud, Claire Bounebache, Karim Rey, Grégoire JMIR Med Inform Original Paper BACKGROUND: The recognition of medical entities from natural language is a ubiquitous problem in the medical field, with applications ranging from medical coding to the analysis of electronic health data for public health. It is, however, a complex task usually requiring human expert intervention, thus making it expansive and time-consuming. Recent advances in artificial intelligence, specifically the rise of deep learning methods, have enabled computers to make efficient decisions on a number of complex problems, with the notable example of neural sequence models and their powerful applications in natural language processing. However, they require a considerable amount of data to learn from, which is typically their main limiting factor. The Centre for Epidemiology on Medical Causes of Death (CépiDc) stores an exhaustive database of death certificates at the French national scale, amounting to several millions of natural language examples provided with their associated human-coded medical entities available to the machine learning practitioner. OBJECTIVE: The aim of this paper was to investigate the application of deep neural sequence models to the problem of medical entity recognition from natural language. METHODS: The investigated data set included every French death certificate from 2011 to 2016. These certificates contain information such as the subject’s age, the subject’s gender, and the chain of events leading to his or her death, both in French and encoded as International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) medical entities, for a total of around 3 million observations in the data set. The task of automatically recognizing ICD-10 medical entities from the French natural language–based chain of events leading to death was then formulated as a type of predictive modeling problem known as a sequence-to-sequence modeling problem. A deep neural network–based model, known as the Transformer, was then slightly adapted and fit to the data set. Its performance was then assessed on an external data set and compared to the current state-of-the-art approach. CIs for derived measurements were estimated via bootstrapping. RESULTS: The proposed approach resulted in an F-measure value of 0.952 (95% CI 0.946-0.957), which constitutes a significant improvement over the current state-of-the-art approach and its previously reported F-measure value of 0.825 as assessed on a comparable data set. Such an improvement makes possible a whole field of new applications, from nosologist-level automated coding to temporal harmonization of death statistics. CONCLUSIONS: This paper shows that a deep artificial neural network can directly learn from voluminous data sets in order to identify complex relationships between natural language and medical entities, without any explicit prior knowledge. Although not entirely free from mistakes, the derived model constitutes a powerful tool for automated coding of medical entities from medical language with promising potential applications. JMIR Publications 2022-04-11 /pmc/articles/PMC9039820/ /pubmed/35404262 http://dx.doi.org/10.2196/26353 Text en ©Louis Falissard, Claire Morgand, Walid Ghosn, Claire Imbaud, Karim Bounebache, Grégoire Rey. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 11.04.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Falissard, Louis
Morgand, Claire
Ghosn, Walid
Imbaud, Claire
Bounebache, Karim
Rey, Grégoire
Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment
title Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment
title_full Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment
title_fullStr Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment
title_full_unstemmed Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment
title_short Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment
title_sort neural translation and automated recognition of icd-10 medical entities from natural language: model development and performance assessment
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9039820/
https://www.ncbi.nlm.nih.gov/pubmed/35404262
http://dx.doi.org/10.2196/26353
work_keys_str_mv AT falissardlouis neuraltranslationandautomatedrecognitionoficd10medicalentitiesfromnaturallanguagemodeldevelopmentandperformanceassessment
AT morgandclaire neuraltranslationandautomatedrecognitionoficd10medicalentitiesfromnaturallanguagemodeldevelopmentandperformanceassessment
AT ghosnwalid neuraltranslationandautomatedrecognitionoficd10medicalentitiesfromnaturallanguagemodeldevelopmentandperformanceassessment
AT imbaudclaire neuraltranslationandautomatedrecognitionoficd10medicalentitiesfromnaturallanguagemodeldevelopmentandperformanceassessment
AT bounebachekarim neuraltranslationandautomatedrecognitionoficd10medicalentitiesfromnaturallanguagemodeldevelopmentandperformanceassessment
AT reygregoire neuraltranslationandautomatedrecognitionoficd10medicalentitiesfromnaturallanguagemodeldevelopmentandperformanceassessment