Cargando…
An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data
In this paper, we applied support vector regression to predict the number of COVID-19 cases for the 12 most-affected countries, testing for different structures of nonlinearity using Kernel functions and analyzing the sensitivity of the models’ predictive performance to different hyperparameters set...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier Ltd.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7324351/ https://www.ncbi.nlm.nih.gov/pubmed/32834608 http://dx.doi.org/10.1016/j.chaos.2020.110055 |
_version_ | 1783551922232360960 |
---|---|
author | Peng, Yaohao Nagata, Mateus Hiro |
author_facet | Peng, Yaohao Nagata, Mateus Hiro |
author_sort | Peng, Yaohao |
collection | PubMed |
description | In this paper, we applied support vector regression to predict the number of COVID-19 cases for the 12 most-affected countries, testing for different structures of nonlinearity using Kernel functions and analyzing the sensitivity of the models’ predictive performance to different hyperparameters settings using 3-D interpolated surfaces. In our experiment, the model that incorporates the highest degree of nonlinearity (Gaussian Kernel) had the best in-sample performance, but also yielded the worst out-of-sample predictions, a typical example of overfitting in a machine learning model. On the other hand, the linear Kernel function performed badly in-sample but generated the best out-of-sample forecasts. The findings of this paper provide an empirical assessment of fundamental concepts in data analysis and evidence the need for caution when applying machine learning models to support real-world decision making, notably with respect to the challenges arising from the COVID-19 pandemics. |
format | Online Article Text |
id | pubmed-7324351 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Elsevier Ltd. |
record_format | MEDLINE/PubMed |
spelling | pubmed-73243512020-06-30 An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data Peng, Yaohao Nagata, Mateus Hiro Chaos Solitons Fractals Article In this paper, we applied support vector regression to predict the number of COVID-19 cases for the 12 most-affected countries, testing for different structures of nonlinearity using Kernel functions and analyzing the sensitivity of the models’ predictive performance to different hyperparameters settings using 3-D interpolated surfaces. In our experiment, the model that incorporates the highest degree of nonlinearity (Gaussian Kernel) had the best in-sample performance, but also yielded the worst out-of-sample predictions, a typical example of overfitting in a machine learning model. On the other hand, the linear Kernel function performed badly in-sample but generated the best out-of-sample forecasts. The findings of this paper provide an empirical assessment of fundamental concepts in data analysis and evidence the need for caution when applying machine learning models to support real-world decision making, notably with respect to the challenges arising from the COVID-19 pandemics. Elsevier Ltd. 2020-10 2020-06-30 /pmc/articles/PMC7324351/ /pubmed/32834608 http://dx.doi.org/10.1016/j.chaos.2020.110055 Text en © 2020 Elsevier Ltd. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active. |
spellingShingle | Article Peng, Yaohao Nagata, Mateus Hiro An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data |
title | An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data |
title_full | An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data |
title_fullStr | An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data |
title_full_unstemmed | An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data |
title_short | An empirical overview of nonlinearity and overfitting in machine learning using COVID-19 data |
title_sort | empirical overview of nonlinearity and overfitting in machine learning using covid-19 data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7324351/ https://www.ncbi.nlm.nih.gov/pubmed/32834608 http://dx.doi.org/10.1016/j.chaos.2020.110055 |
work_keys_str_mv | AT pengyaohao anempiricaloverviewofnonlinearityandoverfittinginmachinelearningusingcovid19data AT nagatamateushiro anempiricaloverviewofnonlinearityandoverfittinginmachinelearningusingcovid19data AT pengyaohao empiricaloverviewofnonlinearityandoverfittinginmachinelearningusingcovid19data AT nagatamateushiro empiricaloverviewofnonlinearityandoverfittinginmachinelearningusingcovid19data |