Cargando…
2378. Use of Natural Language Processing to Extract Published Real World Data on a COVID Vaccine and Antiviral Treatment
BACKGROUND: The COVID-19 global pandemic generated an exponential increase in scientific publications over the last 3 years. LitCovid includes ∼350,000 articles related to COVID-19, vaccine and antiviral medicine development, and real-world data (RWD). The need to identify and extract medical data f...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10678089/ http://dx.doi.org/10.1093/ofid/ofad500.1999 |
_version_ | 1785150283006869504 |
---|---|
author | Raghuram, Anupama Ssentongo, Anna Pereira, Lenon Mendes Kuang, Yuting Uyei, Jennifer Park, Peter |
author_facet | Raghuram, Anupama Ssentongo, Anna Pereira, Lenon Mendes Kuang, Yuting Uyei, Jennifer Park, Peter |
author_sort | Raghuram, Anupama |
collection | PubMed |
description | BACKGROUND: The COVID-19 global pandemic generated an exponential increase in scientific publications over the last 3 years. LitCovid includes ∼350,000 articles related to COVID-19, vaccine and antiviral medicine development, and real-world data (RWD). The need to identify and extract medical data from relevant articles presents an opportunity to use natural language processing (NLP) and machine learning to analyze large volumes of text related to COVID-19. We evaluated the use of NLP to rapidly identify and extract RWD from vaccine and antiviral effectiveness studies to summarize the data for healthcare professionals. METHODS: We used a two-step approach comprised of (1) an automated NLP system with machine learning and rule-based methods to extract medical data from articles and (2) a manual evidence synthesis approach to confirm accuracy by expert review to create a RWD dataset for effectiveness related to BNT162b2 (Pfizer-BioNTech COVID-19 Vaccine) and nirmatrelvir/ritonavir (Paxlovid). NLP linguistic models for automatic data extraction were developed for topics related to vaccine effectiveness, boosters, fourth dose, variant-adapted vaccines, antiviral treatment effectiveness, and long COVID. RWD were captured into a novel data excel template. RESULTS: NLP model training and development was conducted on CORD-19 Open Research Dataset, and NLP-based extraction has been applied on 100 studies. Elapsed NLP-based extraction time was compared to manual extraction by 6 scientific experts over a 10 day test, and speed gains of 2.5-3.0x were achieved. NLP-based and manual extraction of data from >130 publications was then used to present individual study results and summarize effectiveness results into two publicly available websites (www.vaccinemedicaldata.com; www.antiviralmedicaldata.com) with high data accuracy. NLP also allowed for automated labeling and output normalization of data into a scalable model using ontologies. CONCLUSION: NLP is an efficient technology for automated extraction of medical data from published literature for vaccine and antiviral effectiveness topics. Automated medical data mining using NLP may support end-to-end extraction of study results to quickly educate prescribers and improve vaccine and antiviral healthcare decision making. DISCLOSURES: Anupama Raghuram, MD, Merck: Employee|Merck: Stocks/Bonds|Pfizer: Employee|Pfizer: Stocks/Bonds Anna Ssentongo, PhD, IQVIA: Employee Lenon Mendes Pereira, PhD, IQVIA: Employee Yuting Kuang, PhD, IQVIA: Employee Jennifer Uyei, PhD, MPH, IQVIA: Employee|IQVIA: Stocks/Bonds Peter Park, PhD, Pfizer: Employee|Pfizer: Stocks/Bonds |
format | Online Article Text |
id | pubmed-10678089 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-106780892023-11-27 2378. Use of Natural Language Processing to Extract Published Real World Data on a COVID Vaccine and Antiviral Treatment Raghuram, Anupama Ssentongo, Anna Pereira, Lenon Mendes Kuang, Yuting Uyei, Jennifer Park, Peter Open Forum Infect Dis Abstract BACKGROUND: The COVID-19 global pandemic generated an exponential increase in scientific publications over the last 3 years. LitCovid includes ∼350,000 articles related to COVID-19, vaccine and antiviral medicine development, and real-world data (RWD). The need to identify and extract medical data from relevant articles presents an opportunity to use natural language processing (NLP) and machine learning to analyze large volumes of text related to COVID-19. We evaluated the use of NLP to rapidly identify and extract RWD from vaccine and antiviral effectiveness studies to summarize the data for healthcare professionals. METHODS: We used a two-step approach comprised of (1) an automated NLP system with machine learning and rule-based methods to extract medical data from articles and (2) a manual evidence synthesis approach to confirm accuracy by expert review to create a RWD dataset for effectiveness related to BNT162b2 (Pfizer-BioNTech COVID-19 Vaccine) and nirmatrelvir/ritonavir (Paxlovid). NLP linguistic models for automatic data extraction were developed for topics related to vaccine effectiveness, boosters, fourth dose, variant-adapted vaccines, antiviral treatment effectiveness, and long COVID. RWD were captured into a novel data excel template. RESULTS: NLP model training and development was conducted on CORD-19 Open Research Dataset, and NLP-based extraction has been applied on 100 studies. Elapsed NLP-based extraction time was compared to manual extraction by 6 scientific experts over a 10 day test, and speed gains of 2.5-3.0x were achieved. NLP-based and manual extraction of data from >130 publications was then used to present individual study results and summarize effectiveness results into two publicly available websites (www.vaccinemedicaldata.com; www.antiviralmedicaldata.com) with high data accuracy. NLP also allowed for automated labeling and output normalization of data into a scalable model using ontologies. CONCLUSION: NLP is an efficient technology for automated extraction of medical data from published literature for vaccine and antiviral effectiveness topics. Automated medical data mining using NLP may support end-to-end extraction of study results to quickly educate prescribers and improve vaccine and antiviral healthcare decision making. DISCLOSURES: Anupama Raghuram, MD, Merck: Employee|Merck: Stocks/Bonds|Pfizer: Employee|Pfizer: Stocks/Bonds Anna Ssentongo, PhD, IQVIA: Employee Lenon Mendes Pereira, PhD, IQVIA: Employee Yuting Kuang, PhD, IQVIA: Employee Jennifer Uyei, PhD, MPH, IQVIA: Employee|IQVIA: Stocks/Bonds Peter Park, PhD, Pfizer: Employee|Pfizer: Stocks/Bonds Oxford University Press 2023-11-27 /pmc/articles/PMC10678089/ http://dx.doi.org/10.1093/ofid/ofad500.1999 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Infectious Diseases Society of America. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Abstract Raghuram, Anupama Ssentongo, Anna Pereira, Lenon Mendes Kuang, Yuting Uyei, Jennifer Park, Peter 2378. Use of Natural Language Processing to Extract Published Real World Data on a COVID Vaccine and Antiviral Treatment |
title | 2378. Use of Natural Language Processing to Extract Published Real World Data on a COVID Vaccine and Antiviral Treatment |
title_full | 2378. Use of Natural Language Processing to Extract Published Real World Data on a COVID Vaccine and Antiviral Treatment |
title_fullStr | 2378. Use of Natural Language Processing to Extract Published Real World Data on a COVID Vaccine and Antiviral Treatment |
title_full_unstemmed | 2378. Use of Natural Language Processing to Extract Published Real World Data on a COVID Vaccine and Antiviral Treatment |
title_short | 2378. Use of Natural Language Processing to Extract Published Real World Data on a COVID Vaccine and Antiviral Treatment |
title_sort | 2378. use of natural language processing to extract published real world data on a covid vaccine and antiviral treatment |
topic | Abstract |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10678089/ http://dx.doi.org/10.1093/ofid/ofad500.1999 |
work_keys_str_mv | AT raghuramanupama 2378useofnaturallanguageprocessingtoextractpublishedrealworlddataonacovidvaccineandantiviraltreatment AT ssentongoanna 2378useofnaturallanguageprocessingtoextractpublishedrealworlddataonacovidvaccineandantiviraltreatment AT pereiralenonmendes 2378useofnaturallanguageprocessingtoextractpublishedrealworlddataonacovidvaccineandantiviraltreatment AT kuangyuting 2378useofnaturallanguageprocessingtoextractpublishedrealworlddataonacovidvaccineandantiviraltreatment AT uyeijennifer 2378useofnaturallanguageprocessingtoextractpublishedrealworlddataonacovidvaccineandantiviraltreatment AT parkpeter 2378useofnaturallanguageprocessingtoextractpublishedrealworlddataonacovidvaccineandantiviraltreatment |