Cargando…

Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database

Background: Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are p...

Descripción completa

Detalles Bibliográficos
Autores principales: Dong, Tim, Sunderland, Nicholas, Nightingale, Angus, Fudulu, Daniel P., Chan, Jeremy, Zhai, Ben, Freitas, Alberto, Caputo, Massimo, Dimagli, Arnaldo, Mires, Stuart, Wyatt, Mike, Benedetto, Umberto, Angelini, Gianni D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10669818/
https://www.ncbi.nlm.nih.gov/pubmed/38002431
http://dx.doi.org/10.3390/bioengineering10111307
_version_ 1785139781163810816
author Dong, Tim
Sunderland, Nicholas
Nightingale, Angus
Fudulu, Daniel P.
Chan, Jeremy
Zhai, Ben
Freitas, Alberto
Caputo, Massimo
Dimagli, Arnaldo
Mires, Stuart
Wyatt, Mike
Benedetto, Umberto
Angelini, Gianni D.
author_facet Dong, Tim
Sunderland, Nicholas
Nightingale, Angus
Fudulu, Daniel P.
Chan, Jeremy
Zhai, Ben
Freitas, Alberto
Caputo, Massimo
Dimagli, Arnaldo
Mires, Stuart
Wyatt, Mike
Benedetto, Umberto
Angelini, Gianni D.
author_sort Dong, Tim
collection PubMed
description Background: Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are particularly challenging to extract and analyse, because of their narrative structure. Although natural language processing (NLP) has been utilised successfully in a variety of medical fields, it is not commonly used in echocardiography analysis. Objectives: To develop an NLP-based approach for extracting and categorising data from echocardiography reports by accurately converting continuous (e.g., LVOT VTI, AV VTI and TR Vmax) and discrete (e.g., regurgitation severity) outcomes in a semi-structured narrative format into a structured and categorised format, allowing for future research or clinical use. Methods: 135,062 Trans-Thoracic Echocardiogram (TTE) reports were derived from 146967 baseline echocardiogram reports and split into three cohorts: Training and Validation (n = 1075), Test Dataset (n = 98) and Application Dataset (n = 133,889). The NLP system was developed and was iteratively refined using medical expert knowledge. The system was used to curate a moderate-fidelity database from extractions of 133,889 reports. A hold-out validation set of 98 reports was blindly annotated and extracted by two clinicians for comparison with the NLP extraction. Agreement, discrimination, accuracy and calibration of outcome measure extractions were evaluated. Results: Continuous outcomes including LVOT VTI, AV VTI and TR Vmax exhibited perfect inter-rater reliability using intra-class correlation scores (ICC = 1.00, p < 0.05) alongside high R(2) values, demonstrating an ideal alignment between the NLP system and clinicians. A good level (ICC = 0.75–0.9, p < 0.05) of inter-rater reliability was observed for outcomes such as LVOT Diam, Lateral MAPSE, Peak E Velocity, Lateral E’ Velocity, PV Vmax, Sinuses of Valsalva and Ascending Aorta diameters. Furthermore, the accuracy rate for discrete outcome measures was 91.38% in the confusion matrix analysis, indicating effective performance. Conclusions: The NLP-based technique yielded good results when it came to extracting and categorising data from echocardiography reports. The system demonstrated a high degree of agreement and concordance with clinician extractions. This study contributes to the effective use of semi-structured data by providing a useful tool for converting semi-structured text to a structured echo report that can be used for data management. Additional validation and implementation in healthcare settings can improve data availability and support research and clinical decision-making.
format Online
Article
Text
id pubmed-10669818
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-106698182023-11-10 Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database Dong, Tim Sunderland, Nicholas Nightingale, Angus Fudulu, Daniel P. Chan, Jeremy Zhai, Ben Freitas, Alberto Caputo, Massimo Dimagli, Arnaldo Mires, Stuart Wyatt, Mike Benedetto, Umberto Angelini, Gianni D. Bioengineering (Basel) Article Background: Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are particularly challenging to extract and analyse, because of their narrative structure. Although natural language processing (NLP) has been utilised successfully in a variety of medical fields, it is not commonly used in echocardiography analysis. Objectives: To develop an NLP-based approach for extracting and categorising data from echocardiography reports by accurately converting continuous (e.g., LVOT VTI, AV VTI and TR Vmax) and discrete (e.g., regurgitation severity) outcomes in a semi-structured narrative format into a structured and categorised format, allowing for future research or clinical use. Methods: 135,062 Trans-Thoracic Echocardiogram (TTE) reports were derived from 146967 baseline echocardiogram reports and split into three cohorts: Training and Validation (n = 1075), Test Dataset (n = 98) and Application Dataset (n = 133,889). The NLP system was developed and was iteratively refined using medical expert knowledge. The system was used to curate a moderate-fidelity database from extractions of 133,889 reports. A hold-out validation set of 98 reports was blindly annotated and extracted by two clinicians for comparison with the NLP extraction. Agreement, discrimination, accuracy and calibration of outcome measure extractions were evaluated. Results: Continuous outcomes including LVOT VTI, AV VTI and TR Vmax exhibited perfect inter-rater reliability using intra-class correlation scores (ICC = 1.00, p < 0.05) alongside high R(2) values, demonstrating an ideal alignment between the NLP system and clinicians. A good level (ICC = 0.75–0.9, p < 0.05) of inter-rater reliability was observed for outcomes such as LVOT Diam, Lateral MAPSE, Peak E Velocity, Lateral E’ Velocity, PV Vmax, Sinuses of Valsalva and Ascending Aorta diameters. Furthermore, the accuracy rate for discrete outcome measures was 91.38% in the confusion matrix analysis, indicating effective performance. Conclusions: The NLP-based technique yielded good results when it came to extracting and categorising data from echocardiography reports. The system demonstrated a high degree of agreement and concordance with clinician extractions. This study contributes to the effective use of semi-structured data by providing a useful tool for converting semi-structured text to a structured echo report that can be used for data management. Additional validation and implementation in healthcare settings can improve data availability and support research and clinical decision-making. MDPI 2023-11-10 /pmc/articles/PMC10669818/ /pubmed/38002431 http://dx.doi.org/10.3390/bioengineering10111307 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Dong, Tim
Sunderland, Nicholas
Nightingale, Angus
Fudulu, Daniel P.
Chan, Jeremy
Zhai, Ben
Freitas, Alberto
Caputo, Massimo
Dimagli, Arnaldo
Mires, Stuart
Wyatt, Mike
Benedetto, Umberto
Angelini, Gianni D.
Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database
title Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database
title_full Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database
title_fullStr Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database
title_full_unstemmed Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database
title_short Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database
title_sort development and evaluation of a natural language processing system for curating a trans-thoracic echocardiogram (tte) database
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10669818/
https://www.ncbi.nlm.nih.gov/pubmed/38002431
http://dx.doi.org/10.3390/bioengineering10111307
work_keys_str_mv AT dongtim developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT sunderlandnicholas developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT nightingaleangus developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT fuduludanielp developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT chanjeremy developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT zhaiben developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT freitasalberto developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT caputomassimo developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT dimagliarnaldo developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT miresstuart developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT wyattmike developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT benedettoumberto developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase
AT angelinigiannid developmentandevaluationofanaturallanguageprocessingsystemforcuratingatransthoracicechocardiogramttedatabase