Cargando…

Forecasting Corn Yield With Machine Learning Ensembles

The emergence of new technologies to synthesize and analyze big data with high-performance computing has increased our capacity to more accurately predict crop yields. Recent research has shown that machine learning (ML) can provide reasonable predictions faster and with higher flexibility compared...

Descripción completa

Detalles Bibliográficos
Autores principales:	Shahhosseini, Mohsen, Hu, Guiping, Archontoulis, Sotirios V.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2020
Materias:	Plant Science
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7411227/ https://www.ncbi.nlm.nih.gov/pubmed/32849688 http://dx.doi.org/10.3389/fpls.2020.01120

_version_	1783568332585172992
author	Shahhosseini, Mohsen Hu, Guiping Archontoulis, Sotirios V.
author_facet	Shahhosseini, Mohsen Hu, Guiping Archontoulis, Sotirios V.
author_sort	Shahhosseini, Mohsen
collection	PubMed
description	The emergence of new technologies to synthesize and analyze big data with high-performance computing has increased our capacity to more accurately predict crop yields. Recent research has shown that machine learning (ML) can provide reasonable predictions faster and with higher flexibility compared to simulation crop modeling. However, a single machine learning model can be outperformed by a “committee” of models (machine learning ensembles) that can reduce prediction bias, variance, or both and is able to better capture the underlying distribution of the data. Yet, there are many aspects to be investigated with regard to prediction accuracy, time of the prediction, and scale. The earlier the prediction during the growing season the better, but this has not been thoroughly investigated as previous studies considered all data available to predict yields. This paper provides a machine leaning based framework to forecast corn yields in three US Corn Belt states (Illinois, Indiana, and Iowa) considering complete and partial in-season weather knowledge. Several ensemble models are designed using blocked sequential procedure to generate out-of-bag predictions. The forecasts are made in county-level scale and aggregated for agricultural district and state level scales. Results show that the proposed optimized weighted ensemble and the average ensemble are the most precise models with RRMSE of 9.5%. Stacked LASSO makes the least biased predictions (MBE of 53 kg/ha), while other ensemble models also outperformed the base learners in terms of bias. On the contrary, although random k-fold cross-validation is replaced by blocked sequential procedure, it is shown that stacked ensembles perform not as good as weighted ensemble models for time series data sets as they require the data to be non-IID to perform favorably. Comparing our proposed model forecasts with the literature demonstrates the acceptable performance of forecasts made by our proposed ensemble model. Results from the scenario of having partial in-season weather knowledge reveals that decent yield forecasts with RRMSE of 9.2% can be made as early as June 1(st). Moreover, it was shown that the proposed model performed better than individual models and benchmark ensembles at agricultural district and state-level scales as well as county-level scale. To find the marginal effect of each input feature on the forecasts made by the proposed ensemble model, a methodology is suggested that is the basis for finding feature importance for the ensemble model. The findings suggest that weather features corresponding to weather in weeks 18–24 (May 1(st) to June 1(st)) are the most important input features.
format	Online Article Text
id	pubmed-7411227
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-74112272020-08-25 Forecasting Corn Yield With Machine Learning Ensembles Shahhosseini, Mohsen Hu, Guiping Archontoulis, Sotirios V. Front Plant Sci Plant Science The emergence of new technologies to synthesize and analyze big data with high-performance computing has increased our capacity to more accurately predict crop yields. Recent research has shown that machine learning (ML) can provide reasonable predictions faster and with higher flexibility compared to simulation crop modeling. However, a single machine learning model can be outperformed by a “committee” of models (machine learning ensembles) that can reduce prediction bias, variance, or both and is able to better capture the underlying distribution of the data. Yet, there are many aspects to be investigated with regard to prediction accuracy, time of the prediction, and scale. The earlier the prediction during the growing season the better, but this has not been thoroughly investigated as previous studies considered all data available to predict yields. This paper provides a machine leaning based framework to forecast corn yields in three US Corn Belt states (Illinois, Indiana, and Iowa) considering complete and partial in-season weather knowledge. Several ensemble models are designed using blocked sequential procedure to generate out-of-bag predictions. The forecasts are made in county-level scale and aggregated for agricultural district and state level scales. Results show that the proposed optimized weighted ensemble and the average ensemble are the most precise models with RRMSE of 9.5%. Stacked LASSO makes the least biased predictions (MBE of 53 kg/ha), while other ensemble models also outperformed the base learners in terms of bias. On the contrary, although random k-fold cross-validation is replaced by blocked sequential procedure, it is shown that stacked ensembles perform not as good as weighted ensemble models for time series data sets as they require the data to be non-IID to perform favorably. Comparing our proposed model forecasts with the literature demonstrates the acceptable performance of forecasts made by our proposed ensemble model. Results from the scenario of having partial in-season weather knowledge reveals that decent yield forecasts with RRMSE of 9.2% can be made as early as June 1(st). Moreover, it was shown that the proposed model performed better than individual models and benchmark ensembles at agricultural district and state-level scales as well as county-level scale. To find the marginal effect of each input feature on the forecasts made by the proposed ensemble model, a methodology is suggested that is the basis for finding feature importance for the ensemble model. The findings suggest that weather features corresponding to weather in weeks 18–24 (May 1(st) to June 1(st)) are the most important input features. Frontiers Media S.A. 2020-07-31 /pmc/articles/PMC7411227/ /pubmed/32849688 http://dx.doi.org/10.3389/fpls.2020.01120 Text en Copyright © 2020 Shahhosseini, Hu and Archontoulis http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Plant Science Shahhosseini, Mohsen Hu, Guiping Archontoulis, Sotirios V. Forecasting Corn Yield With Machine Learning Ensembles
title	Forecasting Corn Yield With Machine Learning Ensembles
title_full	Forecasting Corn Yield With Machine Learning Ensembles
title_fullStr	Forecasting Corn Yield With Machine Learning Ensembles
title_full_unstemmed	Forecasting Corn Yield With Machine Learning Ensembles
title_short	Forecasting Corn Yield With Machine Learning Ensembles
title_sort	forecasting corn yield with machine learning ensembles
topic	Plant Science
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7411227/ https://www.ncbi.nlm.nih.gov/pubmed/32849688 http://dx.doi.org/10.3389/fpls.2020.01120
work_keys_str_mv	AT shahhosseinimohsen forecastingcornyieldwithmachinelearningensembles AT huguiping forecastingcornyieldwithmachinelearningensembles AT archontoulissotiriosv forecastingcornyieldwithmachinelearningensembles

Forecasting Corn Yield With Machine Learning Ensembles

Ejemplares similares