Cargando…

Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports

Accurate, automated extraction of clinical stroke information from unstructured text has several important applications. ICD-9/10 codes can misclassify ischemic stroke events and do not distinguish acuity or location. Expeditious, accurate data extraction could provide considerable improvement in id...

Descripción completa

Detalles Bibliográficos
Autores principales: Ong, Charlene Jennifer, Orfanoudaki, Agni, Zhang, Rebecca, Caprasse, Francois Pierre M., Hutch, Meghan, Ma, Liang, Fard, Darian, Balogun, Oluwafemi, Miller, Matthew I., Minnig, Margaret, Saglam, Hanife, Prescott, Brenton, Greer, David M., Smirnakis, Stelios, Bertsimas, Dimitris
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304623/
https://www.ncbi.nlm.nih.gov/pubmed/32559211
http://dx.doi.org/10.1371/journal.pone.0234908
_version_ 1783548292715511808
author Ong, Charlene Jennifer
Orfanoudaki, Agni
Zhang, Rebecca
Caprasse, Francois Pierre M.
Hutch, Meghan
Ma, Liang
Fard, Darian
Balogun, Oluwafemi
Miller, Matthew I.
Minnig, Margaret
Saglam, Hanife
Prescott, Brenton
Greer, David M.
Smirnakis, Stelios
Bertsimas, Dimitris
author_facet Ong, Charlene Jennifer
Orfanoudaki, Agni
Zhang, Rebecca
Caprasse, Francois Pierre M.
Hutch, Meghan
Ma, Liang
Fard, Darian
Balogun, Oluwafemi
Miller, Matthew I.
Minnig, Margaret
Saglam, Hanife
Prescott, Brenton
Greer, David M.
Smirnakis, Stelios
Bertsimas, Dimitris
author_sort Ong, Charlene Jennifer
collection PubMed
description Accurate, automated extraction of clinical stroke information from unstructured text has several important applications. ICD-9/10 codes can misclassify ischemic stroke events and do not distinguish acuity or location. Expeditious, accurate data extraction could provide considerable improvement in identifying stroke in large datasets, triaging critical clinical reports, and quality improvement efforts. In this study, we developed and report a comprehensive framework studying the performance of simple and complex stroke-specific Natural Language Processing (NLP) and Machine Learning (ML) methods to determine presence, location, and acuity of ischemic stroke from radiographic text. We collected 60,564 Computed Tomography and Magnetic Resonance Imaging Radiology reports from 17,864 patients from two large academic medical centers. We used standard techniques to featurize unstructured text and developed neurovascular specific word GloVe embeddings. We trained various binary classification algorithms to identify stroke presence, location, and acuity using 75% of 1,359 expert-labeled reports. We validated our methods internally on the remaining 25% of reports and externally on 500 radiology reports from an entirely separate academic institution. In our internal population, GloVe word embeddings paired with deep learning (Recurrent Neural Networks) had the best discrimination of all methods for our three tasks (AUCs of 0.96, 0.98, 0.93 respectively). Simpler NLP approaches (Bag of Words) performed best with interpretable algorithms (Logistic Regression) for identifying ischemic stroke (AUC of 0.95), MCA location (AUC 0.96), and acuity (AUC of 0.90). Similarly, GloVe and Recurrent Neural Networks (AUC 0.92, 0.89, 0.93) generalized better in our external test set than BOW and Logistic Regression for stroke presence, location and acuity, respectively (AUC 0.89, 0.86, 0.80). Our study demonstrates a comprehensive assessment of NLP techniques for unstructured radiographic text. Our findings are suggestive that NLP/ML methods can be used to discriminate stroke features from large data cohorts for both clinical and research-related investigations.
format Online
Article
Text
id pubmed-7304623
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-73046232020-06-22 Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports Ong, Charlene Jennifer Orfanoudaki, Agni Zhang, Rebecca Caprasse, Francois Pierre M. Hutch, Meghan Ma, Liang Fard, Darian Balogun, Oluwafemi Miller, Matthew I. Minnig, Margaret Saglam, Hanife Prescott, Brenton Greer, David M. Smirnakis, Stelios Bertsimas, Dimitris PLoS One Research Article Accurate, automated extraction of clinical stroke information from unstructured text has several important applications. ICD-9/10 codes can misclassify ischemic stroke events and do not distinguish acuity or location. Expeditious, accurate data extraction could provide considerable improvement in identifying stroke in large datasets, triaging critical clinical reports, and quality improvement efforts. In this study, we developed and report a comprehensive framework studying the performance of simple and complex stroke-specific Natural Language Processing (NLP) and Machine Learning (ML) methods to determine presence, location, and acuity of ischemic stroke from radiographic text. We collected 60,564 Computed Tomography and Magnetic Resonance Imaging Radiology reports from 17,864 patients from two large academic medical centers. We used standard techniques to featurize unstructured text and developed neurovascular specific word GloVe embeddings. We trained various binary classification algorithms to identify stroke presence, location, and acuity using 75% of 1,359 expert-labeled reports. We validated our methods internally on the remaining 25% of reports and externally on 500 radiology reports from an entirely separate academic institution. In our internal population, GloVe word embeddings paired with deep learning (Recurrent Neural Networks) had the best discrimination of all methods for our three tasks (AUCs of 0.96, 0.98, 0.93 respectively). Simpler NLP approaches (Bag of Words) performed best with interpretable algorithms (Logistic Regression) for identifying ischemic stroke (AUC of 0.95), MCA location (AUC 0.96), and acuity (AUC of 0.90). Similarly, GloVe and Recurrent Neural Networks (AUC 0.92, 0.89, 0.93) generalized better in our external test set than BOW and Logistic Regression for stroke presence, location and acuity, respectively (AUC 0.89, 0.86, 0.80). Our study demonstrates a comprehensive assessment of NLP techniques for unstructured radiographic text. Our findings are suggestive that NLP/ML methods can be used to discriminate stroke features from large data cohorts for both clinical and research-related investigations. Public Library of Science 2020-06-19 /pmc/articles/PMC7304623/ /pubmed/32559211 http://dx.doi.org/10.1371/journal.pone.0234908 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Ong, Charlene Jennifer
Orfanoudaki, Agni
Zhang, Rebecca
Caprasse, Francois Pierre M.
Hutch, Meghan
Ma, Liang
Fard, Darian
Balogun, Oluwafemi
Miller, Matthew I.
Minnig, Margaret
Saglam, Hanife
Prescott, Brenton
Greer, David M.
Smirnakis, Stelios
Bertsimas, Dimitris
Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports
title Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports
title_full Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports
title_fullStr Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports
title_full_unstemmed Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports
title_short Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports
title_sort machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304623/
https://www.ncbi.nlm.nih.gov/pubmed/32559211
http://dx.doi.org/10.1371/journal.pone.0234908
work_keys_str_mv AT ongcharlenejennifer machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports
AT orfanoudakiagni machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports
AT zhangrebecca machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports
AT caprassefrancoispierrem machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports
AT hutchmeghan machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports
AT maliang machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports
AT farddarian machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports
AT balogunoluwafemi machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports
AT millermatthewi machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports
AT minnigmargaret machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports
AT saglamhanife machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports
AT prescottbrenton machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports
AT greerdavidm machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports
AT smirnakisstelios machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports
AT bertsimasdimitris machinelearningandnaturallanguageprocessingmethodstoidentifyischemicstrokeacuityandlocationfromradiologyreports