Cargando…

Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models

Ensembling combines the predictions made by individual component base models with the goal of achieving a predictive accuracy that is better than that of any one of the constituent member models. Diversity among the base models in terms of predictions is a crucial criterion in ensembling. However, t...

Descripción completa

Detalles Bibliográficos
Autores principales: Shah, Denis A., De Wolf, Erick D., Paul, Pierce A., Madden, Laurence V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7993824/
https://www.ncbi.nlm.nih.gov/pubmed/33720929
http://dx.doi.org/10.1371/journal.pcbi.1008831
_version_ 1783669634347565056
author Shah, Denis A.
De Wolf, Erick D.
Paul, Pierce A.
Madden, Laurence V.
author_facet Shah, Denis A.
De Wolf, Erick D.
Paul, Pierce A.
Madden, Laurence V.
author_sort Shah, Denis A.
collection PubMed
description Ensembling combines the predictions made by individual component base models with the goal of achieving a predictive accuracy that is better than that of any one of the constituent member models. Diversity among the base models in terms of predictions is a crucial criterion in ensembling. However, there are practical instances when the available base models produce highly correlated predictions, because they may have been developed within the same research group or may have been built from the same underlying algorithm. We investigated, via a case study on Fusarium head blight (FHB) on wheat in the U.S., whether ensembles of simple yet highly correlated models for predicting the risk of FHB epidemics, all generated from logistic regression, provided any benefit to predictive performance, despite relatively low levels of base model diversity. Three ensembling methods were explored: soft voting, weighted averaging of smaller subsets of the base models, and penalized regression as a stacking algorithm. Soft voting and weighted model averages were generally better at classification than the base models, though not universally so. The performances of stacked regressions were superior to those of the other two ensembling methods we analyzed in this study. Ensembling simple yet correlated models is computationally feasible and is therefore worth pursuing for models of epidemic risk.
format Online
Article
Text
id pubmed-7993824
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-79938242021-04-05 Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models Shah, Denis A. De Wolf, Erick D. Paul, Pierce A. Madden, Laurence V. PLoS Comput Biol Research Article Ensembling combines the predictions made by individual component base models with the goal of achieving a predictive accuracy that is better than that of any one of the constituent member models. Diversity among the base models in terms of predictions is a crucial criterion in ensembling. However, there are practical instances when the available base models produce highly correlated predictions, because they may have been developed within the same research group or may have been built from the same underlying algorithm. We investigated, via a case study on Fusarium head blight (FHB) on wheat in the U.S., whether ensembles of simple yet highly correlated models for predicting the risk of FHB epidemics, all generated from logistic regression, provided any benefit to predictive performance, despite relatively low levels of base model diversity. Three ensembling methods were explored: soft voting, weighted averaging of smaller subsets of the base models, and penalized regression as a stacking algorithm. Soft voting and weighted model averages were generally better at classification than the base models, though not universally so. The performances of stacked regressions were superior to those of the other two ensembling methods we analyzed in this study. Ensembling simple yet correlated models is computationally feasible and is therefore worth pursuing for models of epidemic risk. Public Library of Science 2021-03-15 /pmc/articles/PMC7993824/ /pubmed/33720929 http://dx.doi.org/10.1371/journal.pcbi.1008831 Text en © 2021 Shah et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Shah, Denis A.
De Wolf, Erick D.
Paul, Pierce A.
Madden, Laurence V.
Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models
title Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models
title_full Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models
title_fullStr Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models
title_full_unstemmed Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models
title_short Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models
title_sort accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7993824/
https://www.ncbi.nlm.nih.gov/pubmed/33720929
http://dx.doi.org/10.1371/journal.pcbi.1008831
work_keys_str_mv AT shahdenisa accuracyinthepredictionofdiseaseepidemicswhenensemblingsimplebuthighlycorrelatedmodels
AT dewolferickd accuracyinthepredictionofdiseaseepidemicswhenensemblingsimplebuthighlycorrelatedmodels
AT paulpiercea accuracyinthepredictionofdiseaseepidemicswhenensemblingsimplebuthighlycorrelatedmodels
AT maddenlaurencev accuracyinthepredictionofdiseaseepidemicswhenensemblingsimplebuthighlycorrelatedmodels