Cargando…

Development and Validation of a Model to Identify Critical Brain Injuries Using Natural Language Processing of Text Computed Tomography Reports

IMPORTANCE: Clinical text reports from head computed tomography (CT) represent rich, incompletely utilized information regarding acute brain injuries and neurologic outcomes. CT reports are unstructured; thus, extracting information at scale requires automated natural language processing (NLP). Howe...

Descripción completa

Detalles Bibliográficos
Autores principales: Torres-Lopez, Victor M., Rovenolt, Grace E., Olcese, Angelo J., Garcia, Gabriella E., Chacko, Sarah M., Robinson, Amber, Gaiser, Edward, Acosta, Julian, Herman, Alison L., Kuohn, Lindsey R., Leary, Megan, Soto, Alexandria L., Zhang, Qiang, Fatima, Safoora, Falcone, Guido J., Payabvash, M. Seyedmehdi, Sharma, Richa, Struck, Aaron F., Sheth, Kevin N., Westover, M. Brandon, Kim, Jennifer A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Association 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9382443/
https://www.ncbi.nlm.nih.gov/pubmed/35972739
http://dx.doi.org/10.1001/jamanetworkopen.2022.27109
_version_ 1784769282286026752
author Torres-Lopez, Victor M.
Rovenolt, Grace E.
Olcese, Angelo J.
Garcia, Gabriella E.
Chacko, Sarah M.
Robinson, Amber
Gaiser, Edward
Acosta, Julian
Herman, Alison L.
Kuohn, Lindsey R.
Leary, Megan
Soto, Alexandria L.
Zhang, Qiang
Fatima, Safoora
Falcone, Guido J.
Payabvash, M. Seyedmehdi
Sharma, Richa
Struck, Aaron F.
Sheth, Kevin N.
Westover, M. Brandon
Kim, Jennifer A.
author_facet Torres-Lopez, Victor M.
Rovenolt, Grace E.
Olcese, Angelo J.
Garcia, Gabriella E.
Chacko, Sarah M.
Robinson, Amber
Gaiser, Edward
Acosta, Julian
Herman, Alison L.
Kuohn, Lindsey R.
Leary, Megan
Soto, Alexandria L.
Zhang, Qiang
Fatima, Safoora
Falcone, Guido J.
Payabvash, M. Seyedmehdi
Sharma, Richa
Struck, Aaron F.
Sheth, Kevin N.
Westover, M. Brandon
Kim, Jennifer A.
author_sort Torres-Lopez, Victor M.
collection PubMed
description IMPORTANCE: Clinical text reports from head computed tomography (CT) represent rich, incompletely utilized information regarding acute brain injuries and neurologic outcomes. CT reports are unstructured; thus, extracting information at scale requires automated natural language processing (NLP). However, designing new NLP algorithms for each individual injury category is an unwieldy proposition. An NLP tool that summarizes all injuries in head CT reports would facilitate exploration of large data sets for clinical significance of neuroradiological findings. OBJECTIVE: To automatically extract acute brain pathological data and their features from head CT reports. DESIGN, SETTING, AND PARTICIPANTS: This diagnostic study developed a 2-part named entity recognition (NER) NLP model to extract and summarize data on acute brain injuries from head CT reports. The model, termed BrainNERD, extracts and summarizes detailed brain injury information for research applications. Model development included building and comparing 2 NER models using a custom dictionary of terms, including lesion type, location, size, and age, then designing a rule-based decoder using NER outputs to evaluate for the presence or absence of injury subtypes. BrainNERD was evaluated against independent test data sets of manually classified reports, including 2 external validation sets. The model was trained on head CT reports from 1152 patients generated by neuroradiologists at the Yale Acute Brain Injury Biorepository. External validation was conducted using reports from 2 outside institutions. Analyses were conducted from May 2020 to December 2021. MAIN OUTCOMES AND MEASURES: Performance of the BrainNERD model was evaluated using precision, recall, and F1 scores based on manually labeled independent test data sets. RESULTS: A total of 1152 patients (mean [SD] age, 67.6 [16.1] years; 586 [52%] men), were included in the training set. NER training using transformer architecture and bidirectional encoder representations from transformers was significantly faster than spaCy. For all metrics, the 10-fold cross-validation performance was 93% to 99%. The final test performance metrics for the NER test data set were 98.82% (95% CI, 98.37%-98.93%) for precision, 98.81% (95% CI, 98.46%-99.06%) for recall, and 98.81% (95% CI, 98.40%-98.94%) for the F score. The expert review comparison metrics were 99.06% (95% CI, 97.89%-99.13%) for precision, 98.10% (95% CI, 97.93%-98.77%) for recall, and 98.57% (95% CI, 97.78%-99.10%) for the F score. The decoder test set metrics were 96.06% (95% CI, 95.01%-97.16%) for precision, 96.42% (95% CI, 94.50%-97.87%) for recall, and 96.18% (95% CI, 95.151%-97.16%) for the F score. Performance in external institution report validation including 1053 head CR reports was greater than 96%. CONCLUSIONS AND RELEVANCE: These findings suggest that the BrainNERD model accurately extracted acute brain injury terms and their properties from head CT text reports. This freely available new tool could advance clinical research by integrating information in easily gathered head CT reports to expand knowledge of acute brain injury radiographic phenotypes.
format Online
Article
Text
id pubmed-9382443
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Medical Association
record_format MEDLINE/PubMed
spelling pubmed-93824432022-08-30 Development and Validation of a Model to Identify Critical Brain Injuries Using Natural Language Processing of Text Computed Tomography Reports Torres-Lopez, Victor M. Rovenolt, Grace E. Olcese, Angelo J. Garcia, Gabriella E. Chacko, Sarah M. Robinson, Amber Gaiser, Edward Acosta, Julian Herman, Alison L. Kuohn, Lindsey R. Leary, Megan Soto, Alexandria L. Zhang, Qiang Fatima, Safoora Falcone, Guido J. Payabvash, M. Seyedmehdi Sharma, Richa Struck, Aaron F. Sheth, Kevin N. Westover, M. Brandon Kim, Jennifer A. JAMA Netw Open Original Investigation IMPORTANCE: Clinical text reports from head computed tomography (CT) represent rich, incompletely utilized information regarding acute brain injuries and neurologic outcomes. CT reports are unstructured; thus, extracting information at scale requires automated natural language processing (NLP). However, designing new NLP algorithms for each individual injury category is an unwieldy proposition. An NLP tool that summarizes all injuries in head CT reports would facilitate exploration of large data sets for clinical significance of neuroradiological findings. OBJECTIVE: To automatically extract acute brain pathological data and their features from head CT reports. DESIGN, SETTING, AND PARTICIPANTS: This diagnostic study developed a 2-part named entity recognition (NER) NLP model to extract and summarize data on acute brain injuries from head CT reports. The model, termed BrainNERD, extracts and summarizes detailed brain injury information for research applications. Model development included building and comparing 2 NER models using a custom dictionary of terms, including lesion type, location, size, and age, then designing a rule-based decoder using NER outputs to evaluate for the presence or absence of injury subtypes. BrainNERD was evaluated against independent test data sets of manually classified reports, including 2 external validation sets. The model was trained on head CT reports from 1152 patients generated by neuroradiologists at the Yale Acute Brain Injury Biorepository. External validation was conducted using reports from 2 outside institutions. Analyses were conducted from May 2020 to December 2021. MAIN OUTCOMES AND MEASURES: Performance of the BrainNERD model was evaluated using precision, recall, and F1 scores based on manually labeled independent test data sets. RESULTS: A total of 1152 patients (mean [SD] age, 67.6 [16.1] years; 586 [52%] men), were included in the training set. NER training using transformer architecture and bidirectional encoder representations from transformers was significantly faster than spaCy. For all metrics, the 10-fold cross-validation performance was 93% to 99%. The final test performance metrics for the NER test data set were 98.82% (95% CI, 98.37%-98.93%) for precision, 98.81% (95% CI, 98.46%-99.06%) for recall, and 98.81% (95% CI, 98.40%-98.94%) for the F score. The expert review comparison metrics were 99.06% (95% CI, 97.89%-99.13%) for precision, 98.10% (95% CI, 97.93%-98.77%) for recall, and 98.57% (95% CI, 97.78%-99.10%) for the F score. The decoder test set metrics were 96.06% (95% CI, 95.01%-97.16%) for precision, 96.42% (95% CI, 94.50%-97.87%) for recall, and 96.18% (95% CI, 95.151%-97.16%) for the F score. Performance in external institution report validation including 1053 head CR reports was greater than 96%. CONCLUSIONS AND RELEVANCE: These findings suggest that the BrainNERD model accurately extracted acute brain injury terms and their properties from head CT text reports. This freely available new tool could advance clinical research by integrating information in easily gathered head CT reports to expand knowledge of acute brain injury radiographic phenotypes. American Medical Association 2022-08-16 /pmc/articles/PMC9382443/ /pubmed/35972739 http://dx.doi.org/10.1001/jamanetworkopen.2022.27109 Text en Copyright 2022 Torres-Lopez VM et al. JAMA Network Open. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the CC-BY License.
spellingShingle Original Investigation
Torres-Lopez, Victor M.
Rovenolt, Grace E.
Olcese, Angelo J.
Garcia, Gabriella E.
Chacko, Sarah M.
Robinson, Amber
Gaiser, Edward
Acosta, Julian
Herman, Alison L.
Kuohn, Lindsey R.
Leary, Megan
Soto, Alexandria L.
Zhang, Qiang
Fatima, Safoora
Falcone, Guido J.
Payabvash, M. Seyedmehdi
Sharma, Richa
Struck, Aaron F.
Sheth, Kevin N.
Westover, M. Brandon
Kim, Jennifer A.
Development and Validation of a Model to Identify Critical Brain Injuries Using Natural Language Processing of Text Computed Tomography Reports
title Development and Validation of a Model to Identify Critical Brain Injuries Using Natural Language Processing of Text Computed Tomography Reports
title_full Development and Validation of a Model to Identify Critical Brain Injuries Using Natural Language Processing of Text Computed Tomography Reports
title_fullStr Development and Validation of a Model to Identify Critical Brain Injuries Using Natural Language Processing of Text Computed Tomography Reports
title_full_unstemmed Development and Validation of a Model to Identify Critical Brain Injuries Using Natural Language Processing of Text Computed Tomography Reports
title_short Development and Validation of a Model to Identify Critical Brain Injuries Using Natural Language Processing of Text Computed Tomography Reports
title_sort development and validation of a model to identify critical brain injuries using natural language processing of text computed tomography reports
topic Original Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9382443/
https://www.ncbi.nlm.nih.gov/pubmed/35972739
http://dx.doi.org/10.1001/jamanetworkopen.2022.27109
work_keys_str_mv AT torreslopezvictorm developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT rovenoltgracee developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT olceseangeloj developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT garciagabriellae developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT chackosarahm developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT robinsonamber developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT gaiseredward developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT acostajulian developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT hermanalisonl developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT kuohnlindseyr developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT learymegan developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT sotoalexandrial developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT zhangqiang developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT fatimasafoora developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT falconeguidoj developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT payabvashmseyedmehdi developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT sharmaricha developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT struckaaronf developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT shethkevinn developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT westovermbrandon developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports
AT kimjennifera developmentandvalidationofamodeltoidentifycriticalbraininjuriesusingnaturallanguageprocessingoftextcomputedtomographyreports