Cargando…
PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning
The recent outbreak of novel Coronavirus disease or COVID-19 is declared a pandemic by the World Health Organization (WHO). The availability of social media platforms has played a vital role in providing and obtaining information about any ongoing event. However, consuming a vast amount of online te...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier Ltd.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8505021/ https://www.ncbi.nlm.nih.gov/pubmed/34655902 http://dx.doi.org/10.1016/j.compbiomed.2021.104920 |
_version_ | 1784581437304864768 |
---|---|
author | Gupta, Aakansha Katarya, Rahul |
author_facet | Gupta, Aakansha Katarya, Rahul |
author_sort | Gupta, Aakansha |
collection | PubMed |
description | The recent outbreak of novel Coronavirus disease or COVID-19 is declared a pandemic by the World Health Organization (WHO). The availability of social media platforms has played a vital role in providing and obtaining information about any ongoing event. However, consuming a vast amount of online textual data to predict an event's trends can be troublesome. To our knowledge, no study analyzes the online news articles and the disease data about coronavirus disease. Therefore, we propose an LDA-based topic model, called PAN-LDA (Pandemic-Latent Dirichlet allocation), that incorporates the COVID-19 cases data and news articles into common LDA to obtain a new set of features. The generated features are introduced as additional features to Machine learning(ML) algorithms to improve the forecasting of time series data. Furthermore, we are employing collapsed Gibbs sampling (CGS) as the underlying technique for parameter inference. The results from experiments suggest that the obtained features from PAN-LDA generate more identifiable topics and empirically add value to the outcome. |
format | Online Article Text |
id | pubmed-8505021 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Elsevier Ltd. |
record_format | MEDLINE/PubMed |
spelling | pubmed-85050212021-10-12 PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning Gupta, Aakansha Katarya, Rahul Comput Biol Med Article The recent outbreak of novel Coronavirus disease or COVID-19 is declared a pandemic by the World Health Organization (WHO). The availability of social media platforms has played a vital role in providing and obtaining information about any ongoing event. However, consuming a vast amount of online textual data to predict an event's trends can be troublesome. To our knowledge, no study analyzes the online news articles and the disease data about coronavirus disease. Therefore, we propose an LDA-based topic model, called PAN-LDA (Pandemic-Latent Dirichlet allocation), that incorporates the COVID-19 cases data and news articles into common LDA to obtain a new set of features. The generated features are introduced as additional features to Machine learning(ML) algorithms to improve the forecasting of time series data. Furthermore, we are employing collapsed Gibbs sampling (CGS) as the underlying technique for parameter inference. The results from experiments suggest that the obtained features from PAN-LDA generate more identifiable topics and empirically add value to the outcome. Elsevier Ltd. 2021-11 2021-10-12 /pmc/articles/PMC8505021/ /pubmed/34655902 http://dx.doi.org/10.1016/j.compbiomed.2021.104920 Text en © 2021 Elsevier Ltd. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Article Gupta, Aakansha Katarya, Rahul PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning |
title | PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning |
title_full | PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning |
title_fullStr | PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning |
title_full_unstemmed | PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning |
title_short | PAN-LDA: A latent Dirichlet allocation based novel feature extraction model for COVID-19 data using machine learning |
title_sort | pan-lda: a latent dirichlet allocation based novel feature extraction model for covid-19 data using machine learning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8505021/ https://www.ncbi.nlm.nih.gov/pubmed/34655902 http://dx.doi.org/10.1016/j.compbiomed.2021.104920 |
work_keys_str_mv | AT guptaaakansha panldaalatentdirichletallocationbasednovelfeatureextractionmodelforcovid19datausingmachinelearning AT kataryarahul panldaalatentdirichletallocationbasednovelfeatureextractionmodelforcovid19datausingmachinelearning |