Cargando…

Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing

BACKGROUND: Heart failure (HF) is a major cause of morbidity and mortality. However, much of the clinical data is unstructured in the form of radiology reports, while the process of data collection and curation is arduous and time-consuming. PURPOSE: We utilized a machine learning (ML)-based natural...

Descripción completa

Detalles Bibliográficos
Autores principales: Pandey, Mohit, Xu, Zhuoran, Sholle, Evan, Maliakal, Gabriel, Singh, Gurpreet, Fatima, Zahra, Larine, Daria, Lee, Benjamin C., Wang, Jing, van Rosendael, Alexander R., Baskaran, Lohendran, Shaw, Leslee J., Min, James K., Al’Aref, Subhi J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7392233/
https://www.ncbi.nlm.nih.gov/pubmed/32730362
http://dx.doi.org/10.1371/journal.pone.0236827
_version_ 1783564805463867392
author Pandey, Mohit
Xu, Zhuoran
Sholle, Evan
Maliakal, Gabriel
Singh, Gurpreet
Fatima, Zahra
Larine, Daria
Lee, Benjamin C.
Wang, Jing
van Rosendael, Alexander R.
Baskaran, Lohendran
Shaw, Leslee J.
Min, James K.
Al’Aref, Subhi J.
author_facet Pandey, Mohit
Xu, Zhuoran
Sholle, Evan
Maliakal, Gabriel
Singh, Gurpreet
Fatima, Zahra
Larine, Daria
Lee, Benjamin C.
Wang, Jing
van Rosendael, Alexander R.
Baskaran, Lohendran
Shaw, Leslee J.
Min, James K.
Al’Aref, Subhi J.
author_sort Pandey, Mohit
collection PubMed
description BACKGROUND: Heart failure (HF) is a major cause of morbidity and mortality. However, much of the clinical data is unstructured in the form of radiology reports, while the process of data collection and curation is arduous and time-consuming. PURPOSE: We utilized a machine learning (ML)-based natural language processing (NLP) approach to extract clinical terms from unstructured radiology reports. Additionally, we investigate the prognostic value of the extracted data in predicting all-cause mortality (ACM) in HF patients. MATERIALS AND METHODS: This observational cohort study utilized 122,025 thoracoabdominal computed tomography (CT) reports from 11,808 HF patients obtained between 2008 and 2018. 1,560 CT reports were manually annotated for the presence or absence of 14 radiographic findings, in addition to age and gender. Thereafter, a Convolutional Neural Network (CNN) was trained, validated and tested to determine the presence or absence of these features. Further, the ability of CNN to predict ACM was evaluated using Cox regression analysis on the extracted features. RESULTS: 11,808 CT reports were analyzed from 11,808 patients (mean age 72.8 ± 14.8 years; 52.7% (6,217/11,808) male) from whom 3,107 died during the 10.6-year follow-up. The CNN demonstrated excellent accuracy for retrieval of the 14 radiographic findings with area-under-the-curve (AUC) ranging between 0.83–1.00 (F1 score 0.84–0.97). Cox model showed the time-dependent AUC for predicting ACM was 0.747 (95% confidence interval [CI] of 0.704–0.790) at 30 days. CONCLUSION: An ML-based NLP approach to unstructured CT reports demonstrates excellent accuracy for the extraction of predetermined radiographic findings, and provides prognostic value in HF patients.
format Online
Article
Text
id pubmed-7392233
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-73922332020-08-05 Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing Pandey, Mohit Xu, Zhuoran Sholle, Evan Maliakal, Gabriel Singh, Gurpreet Fatima, Zahra Larine, Daria Lee, Benjamin C. Wang, Jing van Rosendael, Alexander R. Baskaran, Lohendran Shaw, Leslee J. Min, James K. Al’Aref, Subhi J. PLoS One Research Article BACKGROUND: Heart failure (HF) is a major cause of morbidity and mortality. However, much of the clinical data is unstructured in the form of radiology reports, while the process of data collection and curation is arduous and time-consuming. PURPOSE: We utilized a machine learning (ML)-based natural language processing (NLP) approach to extract clinical terms from unstructured radiology reports. Additionally, we investigate the prognostic value of the extracted data in predicting all-cause mortality (ACM) in HF patients. MATERIALS AND METHODS: This observational cohort study utilized 122,025 thoracoabdominal computed tomography (CT) reports from 11,808 HF patients obtained between 2008 and 2018. 1,560 CT reports were manually annotated for the presence or absence of 14 radiographic findings, in addition to age and gender. Thereafter, a Convolutional Neural Network (CNN) was trained, validated and tested to determine the presence or absence of these features. Further, the ability of CNN to predict ACM was evaluated using Cox regression analysis on the extracted features. RESULTS: 11,808 CT reports were analyzed from 11,808 patients (mean age 72.8 ± 14.8 years; 52.7% (6,217/11,808) male) from whom 3,107 died during the 10.6-year follow-up. The CNN demonstrated excellent accuracy for retrieval of the 14 radiographic findings with area-under-the-curve (AUC) ranging between 0.83–1.00 (F1 score 0.84–0.97). Cox model showed the time-dependent AUC for predicting ACM was 0.747 (95% confidence interval [CI] of 0.704–0.790) at 30 days. CONCLUSION: An ML-based NLP approach to unstructured CT reports demonstrates excellent accuracy for the extraction of predetermined radiographic findings, and provides prognostic value in HF patients. Public Library of Science 2020-07-30 /pmc/articles/PMC7392233/ /pubmed/32730362 http://dx.doi.org/10.1371/journal.pone.0236827 Text en © 2020 Pandey et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Pandey, Mohit
Xu, Zhuoran
Sholle, Evan
Maliakal, Gabriel
Singh, Gurpreet
Fatima, Zahra
Larine, Daria
Lee, Benjamin C.
Wang, Jing
van Rosendael, Alexander R.
Baskaran, Lohendran
Shaw, Leslee J.
Min, James K.
Al’Aref, Subhi J.
Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing
title Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing
title_full Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing
title_fullStr Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing
title_full_unstemmed Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing
title_short Extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing
title_sort extraction of radiographic findings from unstructured thoracoabdominal computed tomography reports using convolutional neural network based natural language processing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7392233/
https://www.ncbi.nlm.nih.gov/pubmed/32730362
http://dx.doi.org/10.1371/journal.pone.0236827
work_keys_str_mv AT pandeymohit extractionofradiographicfindingsfromunstructuredthoracoabdominalcomputedtomographyreportsusingconvolutionalneuralnetworkbasednaturallanguageprocessing
AT xuzhuoran extractionofradiographicfindingsfromunstructuredthoracoabdominalcomputedtomographyreportsusingconvolutionalneuralnetworkbasednaturallanguageprocessing
AT sholleevan extractionofradiographicfindingsfromunstructuredthoracoabdominalcomputedtomographyreportsusingconvolutionalneuralnetworkbasednaturallanguageprocessing
AT maliakalgabriel extractionofradiographicfindingsfromunstructuredthoracoabdominalcomputedtomographyreportsusingconvolutionalneuralnetworkbasednaturallanguageprocessing
AT singhgurpreet extractionofradiographicfindingsfromunstructuredthoracoabdominalcomputedtomographyreportsusingconvolutionalneuralnetworkbasednaturallanguageprocessing
AT fatimazahra extractionofradiographicfindingsfromunstructuredthoracoabdominalcomputedtomographyreportsusingconvolutionalneuralnetworkbasednaturallanguageprocessing
AT larinedaria extractionofradiographicfindingsfromunstructuredthoracoabdominalcomputedtomographyreportsusingconvolutionalneuralnetworkbasednaturallanguageprocessing
AT leebenjaminc extractionofradiographicfindingsfromunstructuredthoracoabdominalcomputedtomographyreportsusingconvolutionalneuralnetworkbasednaturallanguageprocessing
AT wangjing extractionofradiographicfindingsfromunstructuredthoracoabdominalcomputedtomographyreportsusingconvolutionalneuralnetworkbasednaturallanguageprocessing
AT vanrosendaelalexanderr extractionofradiographicfindingsfromunstructuredthoracoabdominalcomputedtomographyreportsusingconvolutionalneuralnetworkbasednaturallanguageprocessing
AT baskaranlohendran extractionofradiographicfindingsfromunstructuredthoracoabdominalcomputedtomographyreportsusingconvolutionalneuralnetworkbasednaturallanguageprocessing
AT shawlesleej extractionofradiographicfindingsfromunstructuredthoracoabdominalcomputedtomographyreportsusingconvolutionalneuralnetworkbasednaturallanguageprocessing
AT minjamesk extractionofradiographicfindingsfromunstructuredthoracoabdominalcomputedtomographyreportsusingconvolutionalneuralnetworkbasednaturallanguageprocessing
AT alarefsubhij extractionofradiographicfindingsfromunstructuredthoracoabdominalcomputedtomographyreportsusingconvolutionalneuralnetworkbasednaturallanguageprocessing