Cargando…

2378. Use of Natural Language Processing to Extract Published Real World Data on a COVID Vaccine and Antiviral Treatment

BACKGROUND: The COVID-19 global pandemic generated an exponential increase in scientific publications over the last 3 years. LitCovid includes ∼350,000 articles related to COVID-19, vaccine and antiviral medicine development, and real-world data (RWD). The need to identify and extract medical data f...

Descripción completa

Detalles Bibliográficos
Autores principales: Raghuram, Anupama, Ssentongo, Anna, Pereira, Lenon Mendes, Kuang, Yuting, Uyei, Jennifer, Park, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10678089/
http://dx.doi.org/10.1093/ofid/ofad500.1999
_version_ 1785150283006869504
author Raghuram, Anupama
Ssentongo, Anna
Pereira, Lenon Mendes
Kuang, Yuting
Uyei, Jennifer
Park, Peter
author_facet Raghuram, Anupama
Ssentongo, Anna
Pereira, Lenon Mendes
Kuang, Yuting
Uyei, Jennifer
Park, Peter
author_sort Raghuram, Anupama
collection PubMed
description BACKGROUND: The COVID-19 global pandemic generated an exponential increase in scientific publications over the last 3 years. LitCovid includes ∼350,000 articles related to COVID-19, vaccine and antiviral medicine development, and real-world data (RWD). The need to identify and extract medical data from relevant articles presents an opportunity to use natural language processing (NLP) and machine learning to analyze large volumes of text related to COVID-19. We evaluated the use of NLP to rapidly identify and extract RWD from vaccine and antiviral effectiveness studies to summarize the data for healthcare professionals. METHODS: We used a two-step approach comprised of (1) an automated NLP system with machine learning and rule-based methods to extract medical data from articles and (2) a manual evidence synthesis approach to confirm accuracy by expert review to create a RWD dataset for effectiveness related to BNT162b2 (Pfizer-BioNTech COVID-19 Vaccine) and nirmatrelvir/ritonavir (Paxlovid). NLP linguistic models for automatic data extraction were developed for topics related to vaccine effectiveness, boosters, fourth dose, variant-adapted vaccines, antiviral treatment effectiveness, and long COVID. RWD were captured into a novel data excel template. RESULTS: NLP model training and development was conducted on CORD-19 Open Research Dataset, and NLP-based extraction has been applied on 100 studies. Elapsed NLP-based extraction time was compared to manual extraction by 6 scientific experts over a 10 day test, and speed gains of 2.5-3.0x were achieved. NLP-based and manual extraction of data from >130 publications was then used to present individual study results and summarize effectiveness results into two publicly available websites (www.vaccinemedicaldata.com; www.antiviralmedicaldata.com) with high data accuracy. NLP also allowed for automated labeling and output normalization of data into a scalable model using ontologies. CONCLUSION: NLP is an efficient technology for automated extraction of medical data from published literature for vaccine and antiviral effectiveness topics. Automated medical data mining using NLP may support end-to-end extraction of study results to quickly educate prescribers and improve vaccine and antiviral healthcare decision making. DISCLOSURES: Anupama Raghuram, MD, Merck: Employee|Merck: Stocks/Bonds|Pfizer: Employee|Pfizer: Stocks/Bonds Anna Ssentongo, PhD, IQVIA: Employee Lenon Mendes Pereira, PhD, IQVIA: Employee Yuting Kuang, PhD, IQVIA: Employee Jennifer Uyei, PhD, MPH, IQVIA: Employee|IQVIA: Stocks/Bonds Peter Park, PhD, Pfizer: Employee|Pfizer: Stocks/Bonds
format Online
Article
Text
id pubmed-10678089
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106780892023-11-27 2378. Use of Natural Language Processing to Extract Published Real World Data on a COVID Vaccine and Antiviral Treatment Raghuram, Anupama Ssentongo, Anna Pereira, Lenon Mendes Kuang, Yuting Uyei, Jennifer Park, Peter Open Forum Infect Dis Abstract BACKGROUND: The COVID-19 global pandemic generated an exponential increase in scientific publications over the last 3 years. LitCovid includes ∼350,000 articles related to COVID-19, vaccine and antiviral medicine development, and real-world data (RWD). The need to identify and extract medical data from relevant articles presents an opportunity to use natural language processing (NLP) and machine learning to analyze large volumes of text related to COVID-19. We evaluated the use of NLP to rapidly identify and extract RWD from vaccine and antiviral effectiveness studies to summarize the data for healthcare professionals. METHODS: We used a two-step approach comprised of (1) an automated NLP system with machine learning and rule-based methods to extract medical data from articles and (2) a manual evidence synthesis approach to confirm accuracy by expert review to create a RWD dataset for effectiveness related to BNT162b2 (Pfizer-BioNTech COVID-19 Vaccine) and nirmatrelvir/ritonavir (Paxlovid). NLP linguistic models for automatic data extraction were developed for topics related to vaccine effectiveness, boosters, fourth dose, variant-adapted vaccines, antiviral treatment effectiveness, and long COVID. RWD were captured into a novel data excel template. RESULTS: NLP model training and development was conducted on CORD-19 Open Research Dataset, and NLP-based extraction has been applied on 100 studies. Elapsed NLP-based extraction time was compared to manual extraction by 6 scientific experts over a 10 day test, and speed gains of 2.5-3.0x were achieved. NLP-based and manual extraction of data from >130 publications was then used to present individual study results and summarize effectiveness results into two publicly available websites (www.vaccinemedicaldata.com; www.antiviralmedicaldata.com) with high data accuracy. NLP also allowed for automated labeling and output normalization of data into a scalable model using ontologies. CONCLUSION: NLP is an efficient technology for automated extraction of medical data from published literature for vaccine and antiviral effectiveness topics. Automated medical data mining using NLP may support end-to-end extraction of study results to quickly educate prescribers and improve vaccine and antiviral healthcare decision making. DISCLOSURES: Anupama Raghuram, MD, Merck: Employee|Merck: Stocks/Bonds|Pfizer: Employee|Pfizer: Stocks/Bonds Anna Ssentongo, PhD, IQVIA: Employee Lenon Mendes Pereira, PhD, IQVIA: Employee Yuting Kuang, PhD, IQVIA: Employee Jennifer Uyei, PhD, MPH, IQVIA: Employee|IQVIA: Stocks/Bonds Peter Park, PhD, Pfizer: Employee|Pfizer: Stocks/Bonds Oxford University Press 2023-11-27 /pmc/articles/PMC10678089/ http://dx.doi.org/10.1093/ofid/ofad500.1999 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Infectious Diseases Society of America. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Abstract
Raghuram, Anupama
Ssentongo, Anna
Pereira, Lenon Mendes
Kuang, Yuting
Uyei, Jennifer
Park, Peter
2378. Use of Natural Language Processing to Extract Published Real World Data on a COVID Vaccine and Antiviral Treatment
title 2378. Use of Natural Language Processing to Extract Published Real World Data on a COVID Vaccine and Antiviral Treatment
title_full 2378. Use of Natural Language Processing to Extract Published Real World Data on a COVID Vaccine and Antiviral Treatment
title_fullStr 2378. Use of Natural Language Processing to Extract Published Real World Data on a COVID Vaccine and Antiviral Treatment
title_full_unstemmed 2378. Use of Natural Language Processing to Extract Published Real World Data on a COVID Vaccine and Antiviral Treatment
title_short 2378. Use of Natural Language Processing to Extract Published Real World Data on a COVID Vaccine and Antiviral Treatment
title_sort 2378. use of natural language processing to extract published real world data on a covid vaccine and antiviral treatment
topic Abstract
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10678089/
http://dx.doi.org/10.1093/ofid/ofad500.1999
work_keys_str_mv AT raghuramanupama 2378useofnaturallanguageprocessingtoextractpublishedrealworlddataonacovidvaccineandantiviraltreatment
AT ssentongoanna 2378useofnaturallanguageprocessingtoextractpublishedrealworlddataonacovidvaccineandantiviraltreatment
AT pereiralenonmendes 2378useofnaturallanguageprocessingtoextractpublishedrealworlddataonacovidvaccineandantiviraltreatment
AT kuangyuting 2378useofnaturallanguageprocessingtoextractpublishedrealworlddataonacovidvaccineandantiviraltreatment
AT uyeijennifer 2378useofnaturallanguageprocessingtoextractpublishedrealworlddataonacovidvaccineandantiviraltreatment
AT parkpeter 2378useofnaturallanguageprocessingtoextractpublishedrealworlddataonacovidvaccineandantiviraltreatment