Cargando…

PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning

The recent outbreak of novel Coronavirus disease or COVID-19 is declared a pandemic by the World Health Organization (WHO). The availability of social media platforms has played a vital role in providing and obtaining information about any ongoing event. However, consuming a vast amount of online te...

Descripción completa

Detalles Bibliográficos
Autores principales: Gupta, Aakansha, Katarya, Rahul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Ltd. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8505021/
https://www.ncbi.nlm.nih.gov/pubmed/34655902
http://dx.doi.org/10.1016/j.compbiomed.2021.104920
_version_ 1784581437304864768
author Gupta, Aakansha
Katarya, Rahul
author_facet Gupta, Aakansha
Katarya, Rahul
author_sort Gupta, Aakansha
collection PubMed
description The recent outbreak of novel Coronavirus disease or COVID-19 is declared a pandemic by the World Health Organization (WHO). The availability of social media platforms has played a vital role in providing and obtaining information about any ongoing event. However, consuming a vast amount of online textual data to predict an event's trends can be troublesome. To our knowledge, no study analyzes the online news articles and the disease data about coronavirus disease. Therefore, we propose an LDA-based topic model, called PAN-LDA (Pandemic-Latent Dirichlet allocation), that incorporates the COVID-19 cases data and news articles into common LDA to obtain a new set of features. The generated features are introduced as additional features to Machine learning(ML) algorithms to improve the forecasting of time series data. Furthermore, we are employing collapsed Gibbs sampling (CGS) as the underlying technique for parameter inference. The results from experiments suggest that the obtained features from PAN-LDA generate more identifiable topics and empirically add value to the outcome.
format Online
Article
Text
id pubmed-8505021
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier Ltd.
record_format MEDLINE/PubMed
spelling pubmed-85050212021-10-12 PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning Gupta, Aakansha Katarya, Rahul Comput Biol Med Article The recent outbreak of novel Coronavirus disease or COVID-19 is declared a pandemic by the World Health Organization (WHO). The availability of social media platforms has played a vital role in providing and obtaining information about any ongoing event. However, consuming a vast amount of online textual data to predict an event's trends can be troublesome. To our knowledge, no study analyzes the online news articles and the disease data about coronavirus disease. Therefore, we propose an LDA-based topic model, called PAN-LDA (Pandemic-Latent Dirichlet allocation), that incorporates the COVID-19 cases data and news articles into common LDA to obtain a new set of features. The generated features are introduced as additional features to Machine learning(ML) algorithms to improve the forecasting of time series data. Furthermore, we are employing collapsed Gibbs sampling (CGS) as the underlying technique for parameter inference. The results from experiments suggest that the obtained features from PAN-LDA generate more identifiable topics and empirically add value to the outcome. Elsevier Ltd. 2021-11 2021-10-12 /pmc/articles/PMC8505021/ /pubmed/34655902 http://dx.doi.org/10.1016/j.compbiomed.2021.104920 Text en © 2021 Elsevier Ltd. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Gupta, Aakansha
Katarya, Rahul
PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning
title PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning
title_full PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning
title_fullStr PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning
title_full_unstemmed PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning
title_short PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning
title_sort pan-lda: a latent dirichlet allocation based novel feature extraction model for covid-19 data using machine learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8505021/
https://www.ncbi.nlm.nih.gov/pubmed/34655902
http://dx.doi.org/10.1016/j.compbiomed.2021.104920
work_keys_str_mv AT guptaaakansha panldaalatentdirichletallocationbasednovelfeatureextractionmodelforcovid19datausingmachinelearning
AT kataryarahul panldaalatentdirichletallocationbasednovelfeatureextractionmodelforcovid19datausingmachinelearning