Cargando…
ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports
INTRODUCTION: Accurate identification of venous thromboembolism (VTE) is critical to develop replicable epidemiological studies and rigorous predictions models. Traditionally, VTE studies have relied on international classification of diseases (ICD) codes which are inaccurate – leading to misclassif...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10652606/ https://www.ncbi.nlm.nih.gov/pubmed/37974186 http://dx.doi.org/10.1186/s12911-023-02369-z |
_version_ | 1785147720448606208 |
---|---|
author | Wang, Jeffrey de Vale, Joao Souza Gupta, Saransh Upadhyaya, Pulakesh Lisboa, Felipe A. Schobel, Seth A. Elster, Eric A. Dente, Christopher J. Buchman, Timothy G. Kamaleswaran, Rishikesan |
author_facet | Wang, Jeffrey de Vale, Joao Souza Gupta, Saransh Upadhyaya, Pulakesh Lisboa, Felipe A. Schobel, Seth A. Elster, Eric A. Dente, Christopher J. Buchman, Timothy G. Kamaleswaran, Rishikesan |
author_sort | Wang, Jeffrey |
collection | PubMed |
description | INTRODUCTION: Accurate identification of venous thromboembolism (VTE) is critical to develop replicable epidemiological studies and rigorous predictions models. Traditionally, VTE studies have relied on international classification of diseases (ICD) codes which are inaccurate – leading to misclassification bias. Here, we developed ClotCatcher, a novel deep learning model that uses natural language processing to detect VTE from radiology reports. METHODS: Radiology reports to detect VTE were obtained from patients admitted to Emory University Hospital (EUH) and Grady Memorial Hospital (GMH). Data augmentation was performed using the Google PEGASUS paraphraser. This data was then used to fine-tune ClotCatcher, a novel deep learning model. ClotCatcher was validated on both the EUH dataset alone and GMH dataset alone. RESULTS: The dataset contained 1358 studies from EUH and 915 studies from GMH (n = 2273). The dataset contained 1506 ultrasound studies with 528 (35.1%) studies positive for VTE, and 767 CT studies with 91 (11.9%) positive for VTE. When validated on the EUH dataset, ClotCatcher performed best (AUC = 0.980) when trained on both EUH and GMH dataset without paraphrasing. When validated on the GMH dataset, ClotCatcher performed best (AUC = 0.995) when trained on both EUH and GMH dataset with paraphrasing. CONCLUSION: ClotCatcher, a novel deep learning model with data augmentation rapidly and accurately adjudicated the presence of VTE from radiology reports. Applying ClotCatcher to large databases would allow for rapid and accurate adjudication of incident VTE. This would reduce misclassification bias and form the foundation for future studies to estimate individual risk for patient to develop incident VTE. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02369-z. |
format | Online Article Text |
id | pubmed-10652606 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-106526062023-11-16 ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports Wang, Jeffrey de Vale, Joao Souza Gupta, Saransh Upadhyaya, Pulakesh Lisboa, Felipe A. Schobel, Seth A. Elster, Eric A. Dente, Christopher J. Buchman, Timothy G. Kamaleswaran, Rishikesan BMC Med Inform Decis Mak Research INTRODUCTION: Accurate identification of venous thromboembolism (VTE) is critical to develop replicable epidemiological studies and rigorous predictions models. Traditionally, VTE studies have relied on international classification of diseases (ICD) codes which are inaccurate – leading to misclassification bias. Here, we developed ClotCatcher, a novel deep learning model that uses natural language processing to detect VTE from radiology reports. METHODS: Radiology reports to detect VTE were obtained from patients admitted to Emory University Hospital (EUH) and Grady Memorial Hospital (GMH). Data augmentation was performed using the Google PEGASUS paraphraser. This data was then used to fine-tune ClotCatcher, a novel deep learning model. ClotCatcher was validated on both the EUH dataset alone and GMH dataset alone. RESULTS: The dataset contained 1358 studies from EUH and 915 studies from GMH (n = 2273). The dataset contained 1506 ultrasound studies with 528 (35.1%) studies positive for VTE, and 767 CT studies with 91 (11.9%) positive for VTE. When validated on the EUH dataset, ClotCatcher performed best (AUC = 0.980) when trained on both EUH and GMH dataset without paraphrasing. When validated on the GMH dataset, ClotCatcher performed best (AUC = 0.995) when trained on both EUH and GMH dataset with paraphrasing. CONCLUSION: ClotCatcher, a novel deep learning model with data augmentation rapidly and accurately adjudicated the presence of VTE from radiology reports. Applying ClotCatcher to large databases would allow for rapid and accurate adjudication of incident VTE. This would reduce misclassification bias and form the foundation for future studies to estimate individual risk for patient to develop incident VTE. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-023-02369-z. BioMed Central 2023-11-16 /pmc/articles/PMC10652606/ /pubmed/37974186 http://dx.doi.org/10.1186/s12911-023-02369-z Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Wang, Jeffrey de Vale, Joao Souza Gupta, Saransh Upadhyaya, Pulakesh Lisboa, Felipe A. Schobel, Seth A. Elster, Eric A. Dente, Christopher J. Buchman, Timothy G. Kamaleswaran, Rishikesan ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports |
title | ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports |
title_full | ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports |
title_fullStr | ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports |
title_full_unstemmed | ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports |
title_short | ClotCatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports |
title_sort | clotcatcher: a novel natural language model to accurately adjudicate venous thromboembolism from radiology reports |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10652606/ https://www.ncbi.nlm.nih.gov/pubmed/37974186 http://dx.doi.org/10.1186/s12911-023-02369-z |
work_keys_str_mv | AT wangjeffrey clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports AT devalejoaosouza clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports AT guptasaransh clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports AT upadhyayapulakesh clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports AT lisboafelipea clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports AT schobelsetha clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports AT elstererica clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports AT dentechristopherj clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports AT buchmantimothyg clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports AT kamaleswaranrishikesan clotcatcheranovelnaturallanguagemodeltoaccuratelyadjudicatevenousthromboembolismfromradiologyreports |