Cargando…

ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports

INTRODUCTION: Accurate identification of venous thromboembolism (VTE) is critical to develop replicable epidemiological studies and rigorous predictions models. Traditionally, VTE studies have relied on international classification of diseases (ICD) codes which are inaccurate – leading to misclassif...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Jeffrey, de Vale, Joao Souza, Gupta, Saransh, Upadhyaya, Pulakesh, Lisboa, Felipe A., Schobel, Seth A., Elster, Eric A., Dente, Christopher J., Buchman, Timothy G., Kamaleswaran, Rishikesan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10652606/
https://www.ncbi.nlm.nih.gov/pubmed/37974186
http://dx.doi.org/10.1186/s12911-023-02369-z
_version_ 1785147720448606208
author Wang, Jeffrey
de Vale, Joao Souza
Gupta, Saransh
Upadhyaya, Pulakesh
Lisboa, Felipe A.
Schobel, Seth A.
Elster, Eric A.
Dente, Christopher J.
Buchman, Timothy G.
Kamaleswaran, Rishikesan
author_facet Wang, Jeffrey
de Vale, Joao Souza
Gupta, Saransh
Upadhyaya, Pulakesh
Lisboa, Felipe A.
Schobel, Seth A.
Elster, Eric A.
Dente, Christopher J.
Buchman, Timothy G.
Kamaleswaran, Rishikesan
author_sort Wang, Jeffrey
collection PubMed
description INTRODUCTION: Accurate identification of venous thromboembolism (VTE) is critical to develop replicable epidemiological studies and rigorous predictions models. Traditionally, VTE studies have relied on international classification of diseases (ICD) codes which are inaccurate – leading to misclassification bias. Here, we developed ClotCatcher, a novel deep learning model that uses natural language processing to detect VTE from radiology reports. METHODS: Radiology reports to detect VTE were obtained from patients admitted to Emory University Hospital (EUH) and Grady Memorial Hospital (GMH). Data augmentation was performed using the Google PEGASUS paraphraser. This data was then used to fine-tune ClotCatcher, a novel deep learning model. ClotCatcher was validated on both the EUH dataset alone and GMH dataset alone. RESULTS: The dataset contained 1358 studies from EUH and 915 studies from GMH (n = 2273). The dataset contained 1506 ultrasound studies with 528 (35.1%) studies positive for VTE, and 767 CT studies with 91 (11.9%) positive for VTE. When validated on the EUH dataset, ClotCatcher performed best (AUC = 0.980) when trained on both EUH and GMH dataset without paraphrasing. When validated on the GMH dataset, ClotCatcher performed best (AUC = 0.995) when trained on both EUH and GMH dataset with paraphrasing. CONCLUSION: ClotCatcher, a novel deep learning model with data augmentation rapidly and accurately adjudicated the presence of VTE from radiology reports. Applying ClotCatcher to large databases would allow for rapid and accurate adjudication of incident VTE. This would reduce misclassification bias and form the foundation for future studies to estimate individual risk for patient to develop incident VTE. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02369-z.
format Online
Article
Text
id pubmed-10652606
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106526062023-11-16 ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports Wang, Jeffrey de Vale, Joao Souza Gupta, Saransh Upadhyaya, Pulakesh Lisboa, Felipe A. Schobel, Seth A. Elster, Eric A. Dente, Christopher J. Buchman, Timothy G. Kamaleswaran, Rishikesan BMC Med Inform Decis Mak Research INTRODUCTION: Accurate identification of venous thromboembolism (VTE) is critical to develop replicable epidemiological studies and rigorous predictions models. Traditionally, VTE studies have relied on international classification of diseases (ICD) codes which are inaccurate – leading to misclassification bias. Here, we developed ClotCatcher, a novel deep learning model that uses natural language processing to detect VTE from radiology reports. METHODS: Radiology reports to detect VTE were obtained from patients admitted to Emory University Hospital (EUH) and Grady Memorial Hospital (GMH). Data augmentation was performed using the Google PEGASUS paraphraser. This data was then used to fine-tune ClotCatcher, a novel deep learning model. ClotCatcher was validated on both the EUH dataset alone and GMH dataset alone. RESULTS: The dataset contained 1358 studies from EUH and 915 studies from GMH (n = 2273). The dataset contained 1506 ultrasound studies with 528 (35.1%) studies positive for VTE, and 767 CT studies with 91 (11.9%) positive for VTE. When validated on the EUH dataset, ClotCatcher performed best (AUC = 0.980) when trained on both EUH and GMH dataset without paraphrasing. When validated on the GMH dataset, ClotCatcher performed best (AUC = 0.995) when trained on both EUH and GMH dataset with paraphrasing. CONCLUSION: ClotCatcher, a novel deep learning model with data augmentation rapidly and accurately adjudicated the presence of VTE from radiology reports. Applying ClotCatcher to large databases would allow for rapid and accurate adjudication of incident VTE. This would reduce misclassification bias and form the foundation for future studies to estimate individual risk for patient to develop incident VTE. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02369-z. BioMed Central 2023-11-16 /pmc/articles/PMC10652606/ /pubmed/37974186 http://dx.doi.org/10.1186/s12911-023-02369-z Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Wang, Jeffrey
de Vale, Joao Souza
Gupta, Saransh
Upadhyaya, Pulakesh
Lisboa, Felipe A.
Schobel, Seth A.
Elster, Eric A.
Dente, Christopher J.
Buchman, Timothy G.
Kamaleswaran, Rishikesan
ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports
title ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports
title_full ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports
title_fullStr ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports
title_full_unstemmed ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports
title_short ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports
title_sort clotcatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10652606/
https://www.ncbi.nlm.nih.gov/pubmed/37974186
http://dx.doi.org/10.1186/s12911-023-02369-z
work_keys_str_mv AT wangjeffrey clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports
AT devalejoaosouza clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports
AT guptasaransh clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports
AT upadhyayapulakesh clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports
AT lisboafelipea clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports
AT schobelsetha clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports
AT elstererica clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports
AT dentechristopherj clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports
AT buchmantimothyg clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports
AT kamaleswaranrishikesan clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports