Cargando…

Statistical Explorations and Univariate Timeseries Analysis on COVID-19 Datasets to Understand the Trend of Disease Spreading and Death

“Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2)”, the novel coronavirus, is responsible for the ongoing worldwide pandemic. “World Health Organization (WHO)” assigned an “International Classification of Diseases (ICD)” code—“COVID-19”-as the name of the new disease. Coronaviruses are g...

Descripción completa

Detalles Bibliográficos
Autores principales: Chatterjee, Ayan, Gerdes, Martin W., Martinez, Santiago G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7308840/
https://www.ncbi.nlm.nih.gov/pubmed/32486055
http://dx.doi.org/10.3390/s20113089
_version_ 1783549083413118976
author Chatterjee, Ayan
Gerdes, Martin W.
Martinez, Santiago G.
author_facet Chatterjee, Ayan
Gerdes, Martin W.
Martinez, Santiago G.
author_sort Chatterjee, Ayan
collection PubMed
description “Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2)”, the novel coronavirus, is responsible for the ongoing worldwide pandemic. “World Health Organization (WHO)” assigned an “International Classification of Diseases (ICD)” code—“COVID-19”-as the name of the new disease. Coronaviruses are generally transferred by people and many diverse species of animals, including birds and mammals such as cattle, camels, cats, and bats. Infrequently, the coronavirus can be transferred from animals to humans, and then propagate among people, such as with “Middle East Respiratory Syndrome (MERS-CoV)”, “Severe Acute Respiratory Syndrome (SARS-CoV)”, and now with this new virus, namely “SARS-CoV-2”, or human coronavirus. Its rapid spreading has sent billions of people into lockdown as health services struggle to cope up. The COVID-19 outbreak comes along with an exponential growth of new infections, as well as a growing death count. A major goal to limit the further exponential spreading is to slow down the transmission rate, which is denoted by a “spread factor (f)”, and we proposed an algorithm in this study for analyzing the same. This paper addresses the potential of data science to assess the risk factors correlated with COVID-19, after analyzing existing datasets available in “ourworldindata.org (Oxford University database)”, and newly simulated datasets, following the analysis of different univariate “Long Short Term Memory (LSTM)” models for forecasting new cases and resulting deaths. The result shows that vanilla, stacked, and bidirectional LSTM models outperformed multilayer LSTM models. Besides, we discuss the findings related to the statistical analysis on simulated datasets. For correlation analysis, we included features, such as external temperature, rainfall, sunshine, population, infected cases, death, country, population, area, and population density of the past three months—January, February, and March in 2020. For univariate timeseries forecasting using LSTM, we used datasets from 1 January 2020, to 22 April 2020.
format Online
Article
Text
id pubmed-7308840
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-73088402020-06-25 Statistical Explorations and Univariate Timeseries Analysis on COVID-19 Datasets to Understand the Trend of Disease Spreading and Death Chatterjee, Ayan Gerdes, Martin W. Martinez, Santiago G. Sensors (Basel) Article “Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2)”, the novel coronavirus, is responsible for the ongoing worldwide pandemic. “World Health Organization (WHO)” assigned an “International Classification of Diseases (ICD)” code—“COVID-19”-as the name of the new disease. Coronaviruses are generally transferred by people and many diverse species of animals, including birds and mammals such as cattle, camels, cats, and bats. Infrequently, the coronavirus can be transferred from animals to humans, and then propagate among people, such as with “Middle East Respiratory Syndrome (MERS-CoV)”, “Severe Acute Respiratory Syndrome (SARS-CoV)”, and now with this new virus, namely “SARS-CoV-2”, or human coronavirus. Its rapid spreading has sent billions of people into lockdown as health services struggle to cope up. The COVID-19 outbreak comes along with an exponential growth of new infections, as well as a growing death count. A major goal to limit the further exponential spreading is to slow down the transmission rate, which is denoted by a “spread factor (f)”, and we proposed an algorithm in this study for analyzing the same. This paper addresses the potential of data science to assess the risk factors correlated with COVID-19, after analyzing existing datasets available in “ourworldindata.org (Oxford University database)”, and newly simulated datasets, following the analysis of different univariate “Long Short Term Memory (LSTM)” models for forecasting new cases and resulting deaths. The result shows that vanilla, stacked, and bidirectional LSTM models outperformed multilayer LSTM models. Besides, we discuss the findings related to the statistical analysis on simulated datasets. For correlation analysis, we included features, such as external temperature, rainfall, sunshine, population, infected cases, death, country, population, area, and population density of the past three months—January, February, and March in 2020. For univariate timeseries forecasting using LSTM, we used datasets from 1 January 2020, to 22 April 2020. MDPI 2020-05-29 /pmc/articles/PMC7308840/ /pubmed/32486055 http://dx.doi.org/10.3390/s20113089 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Chatterjee, Ayan
Gerdes, Martin W.
Martinez, Santiago G.
Statistical Explorations and Univariate Timeseries Analysis on COVID-19 Datasets to Understand the Trend of Disease Spreading and Death
title Statistical Explorations and Univariate Timeseries Analysis on COVID-19 Datasets to Understand the Trend of Disease Spreading and Death
title_full Statistical Explorations and Univariate Timeseries Analysis on COVID-19 Datasets to Understand the Trend of Disease Spreading and Death
title_fullStr Statistical Explorations and Univariate Timeseries Analysis on COVID-19 Datasets to Understand the Trend of Disease Spreading and Death
title_full_unstemmed Statistical Explorations and Univariate Timeseries Analysis on COVID-19 Datasets to Understand the Trend of Disease Spreading and Death
title_short Statistical Explorations and Univariate Timeseries Analysis on COVID-19 Datasets to Understand the Trend of Disease Spreading and Death
title_sort statistical explorations and univariate timeseries analysis on covid-19 datasets to understand the trend of disease spreading and death
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7308840/
https://www.ncbi.nlm.nih.gov/pubmed/32486055
http://dx.doi.org/10.3390/s20113089
work_keys_str_mv AT chatterjeeayan statisticalexplorationsandunivariatetimeseriesanalysisoncovid19datasetstounderstandthetrendofdiseasespreadinganddeath
AT gerdesmartinw statisticalexplorationsandunivariatetimeseriesanalysisoncovid19datasetstounderstandthetrendofdiseasespreadinganddeath
AT martinezsantiagog statisticalexplorationsandunivariatetimeseriesanalysisoncovid19datasetstounderstandthetrendofdiseasespreadinganddeath