Cargando…

An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data

In this paper, we applied support vector regression to predict the number of COVID-19 cases for the 12 most-affected countries, testing for different structures of nonlinearity using Kernel functions and analyzing the sensitivity of the models’ predictive performance to different hyperparameters set...

Descripción completa

Detalles Bibliográficos
Autores principales: Peng, Yaohao, Nagata, Mateus Hiro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Ltd. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7324351/
https://www.ncbi.nlm.nih.gov/pubmed/32834608
http://dx.doi.org/10.1016/j.chaos.2020.110055
_version_ 1783551922232360960
author Peng, Yaohao
Nagata, Mateus Hiro
author_facet Peng, Yaohao
Nagata, Mateus Hiro
author_sort Peng, Yaohao
collection PubMed
description In this paper, we applied support vector regression to predict the number of COVID-19 cases for the 12 most-affected countries, testing for different structures of nonlinearity using Kernel functions and analyzing the sensitivity of the models’ predictive performance to different hyperparameters settings using 3-D interpolated surfaces. In our experiment, the model that incorporates the highest degree of nonlinearity (Gaussian Kernel) had the best in-sample performance, but also yielded the worst out-of-sample predictions, a typical example of overfitting in a machine learning model. On the other hand, the linear Kernel function performed badly in-sample but generated the best out-of-sample forecasts. The findings of this paper provide an empirical assessment of fundamental concepts in data analysis and evidence the need for caution when applying machine learning models to support real-world decision making, notably with respect to the challenges arising from the COVID-19 pandemics.
format Online
Article
Text
id pubmed-7324351
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Elsevier Ltd.
record_format MEDLINE/PubMed
spelling pubmed-73243512020-06-30 An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data Peng, Yaohao Nagata, Mateus Hiro Chaos Solitons Fractals Article In this paper, we applied support vector regression to predict the number of COVID-19 cases for the 12 most-affected countries, testing for different structures of nonlinearity using Kernel functions and analyzing the sensitivity of the models’ predictive performance to different hyperparameters settings using 3-D interpolated surfaces. In our experiment, the model that incorporates the highest degree of nonlinearity (Gaussian Kernel) had the best in-sample performance, but also yielded the worst out-of-sample predictions, a typical example of overfitting in a machine learning model. On the other hand, the linear Kernel function performed badly in-sample but generated the best out-of-sample forecasts. The findings of this paper provide an empirical assessment of fundamental concepts in data analysis and evidence the need for caution when applying machine learning models to support real-world decision making, notably with respect to the challenges arising from the COVID-19 pandemics. Elsevier Ltd. 2020-10 2020-06-30 /pmc/articles/PMC7324351/ /pubmed/32834608 http://dx.doi.org/10.1016/j.chaos.2020.110055 Text en © 2020 Elsevier Ltd. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Peng, Yaohao
Nagata, Mateus Hiro
An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data
title An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data
title_full An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data
title_fullStr An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data
title_full_unstemmed An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data
title_short An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data
title_sort empirical overview of nonlinearity and overfitting in machine learning using covid-19 data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7324351/
https://www.ncbi.nlm.nih.gov/pubmed/32834608
http://dx.doi.org/10.1016/j.chaos.2020.110055
work_keys_str_mv AT pengyaohao anempiricaloverviewofnonlinearityandoverfittinginmachinelearningusingcovid19data
AT nagatamateushiro anempiricaloverviewofnonlinearityandoverfittinginmachinelearningusingcovid19data
AT pengyaohao empiricaloverviewofnonlinearityandoverfittinginmachinelearningusingcovid19data
AT nagatamateushiro empiricaloverviewofnonlinearityandoverfittinginmachinelearningusingcovid19data