Cargando…

Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology

BACKGROUND: Clinical natural language processing (cNLP) systems are of crucial importance due to their increasing capability in extracting clinically important information from free text contained in electronic health records (EHRs). The conversion of a nonstructured representation of a patient’s cl...

Descripción completa

Detalles Bibliográficos
Autores principales:	Canales, Lea, Menke, Sebastian, Marchesseau, Stephanie, D’Agostino, Ariel, del Rio-Bermudez, Carlos, Taberna, Miren, Tello, Jorge
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2021
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8367121/ https://www.ncbi.nlm.nih.gov/pubmed/34297002 http://dx.doi.org/10.2196/20492

_version_	1783739014283526144
author	Canales, Lea Menke, Sebastian Marchesseau, Stephanie D’Agostino, Ariel del Rio-Bermudez, Carlos Taberna, Miren Tello, Jorge
author_facet	Canales, Lea Menke, Sebastian Marchesseau, Stephanie D’Agostino, Ariel del Rio-Bermudez, Carlos Taberna, Miren Tello, Jorge
author_sort	Canales, Lea
collection	PubMed
description	BACKGROUND: Clinical natural language processing (cNLP) systems are of crucial importance due to their increasing capability in extracting clinically important information from free text contained in electronic health records (EHRs). The conversion of a nonstructured representation of a patient’s clinical history into a structured format enables medical doctors to generate clinical knowledge at a level that was not possible before. Finally, the interpretation of the insights gained provided by cNLP systems has a great potential in driving decisions about clinical practice. However, carrying out robust evaluations of those cNLP systems is a complex task that is hindered by a lack of standard guidance on how to systematically approach them. OBJECTIVE: Our objective was to offer natural language processing (NLP) experts a methodology for the evaluation of cNLP systems to assist them in carrying out this task. By following the proposed phases, the robustness and representativeness of the performance metrics of their own cNLP systems can be assured. METHODS: The proposed evaluation methodology comprised five phases: (1) the definition of the target population, (2) the statistical document collection, (3) the design of the annotation guidelines and annotation project, (4) the external annotations, and (5) the cNLP system performance evaluation. We presented the application of all phases to evaluate the performance of a cNLP system called “EHRead Technology” (developed by Savana, an international medical company), applied in a study on patients with asthma. As part of the evaluation methodology, we introduced the Sample Size Calculator for Evaluations (SLiCE), a software tool that calculates the number of documents needed to achieve a statistically useful and resourceful gold standard. RESULTS: The application of the proposed evaluation methodology on a real use-case study of patients with asthma revealed the benefit of the different phases for cNLP system evaluations. By using SLiCE to adjust the number of documents needed, a meaningful and resourceful gold standard was created. In the presented use-case, using as little as 519 EHRs, it was possible to evaluate the performance of the cNLP system and obtain performance metrics for the primary variable within the expected CIs. CONCLUSIONS: We showed that our evaluation methodology can offer guidance to NLP experts on how to approach the evaluation of their cNLP systems. By following the five phases, NLP experts can assure the robustness of their evaluation and avoid unnecessary investment of human and financial resources. Besides the theoretical guidance, we offer SLiCE as an easy-to-use, open-source Python library.
format	Online Article Text
id	pubmed-8367121
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-83671212021-08-24 Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology Canales, Lea Menke, Sebastian Marchesseau, Stephanie D’Agostino, Ariel del Rio-Bermudez, Carlos Taberna, Miren Tello, Jorge JMIR Med Inform Original Paper BACKGROUND: Clinical natural language processing (cNLP) systems are of crucial importance due to their increasing capability in extracting clinically important information from free text contained in electronic health records (EHRs). The conversion of a nonstructured representation of a patient’s clinical history into a structured format enables medical doctors to generate clinical knowledge at a level that was not possible before. Finally, the interpretation of the insights gained provided by cNLP systems has a great potential in driving decisions about clinical practice. However, carrying out robust evaluations of those cNLP systems is a complex task that is hindered by a lack of standard guidance on how to systematically approach them. OBJECTIVE: Our objective was to offer natural language processing (NLP) experts a methodology for the evaluation of cNLP systems to assist them in carrying out this task. By following the proposed phases, the robustness and representativeness of the performance metrics of their own cNLP systems can be assured. METHODS: The proposed evaluation methodology comprised five phases: (1) the definition of the target population, (2) the statistical document collection, (3) the design of the annotation guidelines and annotation project, (4) the external annotations, and (5) the cNLP system performance evaluation. We presented the application of all phases to evaluate the performance of a cNLP system called “EHRead Technology” (developed by Savana, an international medical company), applied in a study on patients with asthma. As part of the evaluation methodology, we introduced the Sample Size Calculator for Evaluations (SLiCE), a software tool that calculates the number of documents needed to achieve a statistically useful and resourceful gold standard. RESULTS: The application of the proposed evaluation methodology on a real use-case study of patients with asthma revealed the benefit of the different phases for cNLP system evaluations. By using SLiCE to adjust the number of documents needed, a meaningful and resourceful gold standard was created. In the presented use-case, using as little as 519 EHRs, it was possible to evaluate the performance of the cNLP system and obtain performance metrics for the primary variable within the expected CIs. CONCLUSIONS: We showed that our evaluation methodology can offer guidance to NLP experts on how to approach the evaluation of their cNLP systems. By following the five phases, NLP experts can assure the robustness of their evaluation and avoid unnecessary investment of human and financial resources. Besides the theoretical guidance, we offer SLiCE as an easy-to-use, open-source Python library. JMIR Publications 2021-07-23 /pmc/articles/PMC8367121/ /pubmed/34297002 http://dx.doi.org/10.2196/20492 Text en ©Lea Canales, Sebastian Menke, Stephanie Marchesseau, Ariel D’Agostino, Carlos del Rio-Bermudez, Miren Taberna, Jorge Tello. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 23.07.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Canales, Lea Menke, Sebastian Marchesseau, Stephanie D’Agostino, Ariel del Rio-Bermudez, Carlos Taberna, Miren Tello, Jorge Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology
title	Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology
title_full	Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology
title_fullStr	Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology
title_full_unstemmed	Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology
title_short	Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology
title_sort	assessing the performance of clinical natural language processing systems: development of an evaluation methodology
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8367121/ https://www.ncbi.nlm.nih.gov/pubmed/34297002 http://dx.doi.org/10.2196/20492
work_keys_str_mv	AT canaleslea assessingtheperformanceofclinicalnaturallanguageprocessingsystemsdevelopmentofanevaluationmethodology AT menkesebastian assessingtheperformanceofclinicalnaturallanguageprocessingsystemsdevelopmentofanevaluationmethodology AT marchesseaustephanie assessingtheperformanceofclinicalnaturallanguageprocessingsystemsdevelopmentofanevaluationmethodology AT dagostinoariel assessingtheperformanceofclinicalnaturallanguageprocessingsystemsdevelopmentofanevaluationmethodology AT delriobermudezcarlos assessingtheperformanceofclinicalnaturallanguageprocessingsystemsdevelopmentofanevaluationmethodology AT tabernamiren assessingtheperformanceofclinicalnaturallanguageprocessingsystemsdevelopmentofanevaluationmethodology AT tellojorge assessingtheperformanceofclinicalnaturallanguageprocessingsystemsdevelopmentofanevaluationmethodology

Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology

Ejemplares similares