Cargando…
Choosing multiple linear regressions for weather-based crop yield prediction with ABSOLUT v1.2 applied to the districts of Germany
ABSOLUT v1.2 is an adaptive algorithm that uses correlations between time-aggregated weather variables and crop yields for yield prediction. In contrast to conventional regression-based yield prediction methods, a very broad range of possible input features and their combinations are exhaustively te...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Berlin Heidelberg
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9440329/ https://www.ncbi.nlm.nih.gov/pubmed/36056956 http://dx.doi.org/10.1007/s00484-022-02356-5 |
Sumario: | ABSOLUT v1.2 is an adaptive algorithm that uses correlations between time-aggregated weather variables and crop yields for yield prediction. In contrast to conventional regression-based yield prediction methods, a very broad range of possible input features and their combinations are exhaustively tested for maximum explanatory power. Weather variables such as temperature, precipitation, and sunshine duration are aggregated over different seasonal time periods preceding the harvest to 45 potential input features per original variable. In a first step, this large set of features is reduced to those aggregates very probably holding explanatory power for observed yields. The second, computationally demanding step evaluates predictions for all districts with all of their possible combinations. Step three selects those combinations of weather features that showed the highest predictive power across districts. Finally, the district-specific best performing regressions among these are used for actual prediction, and the results are spatially aggregated. To evaluate the new approach, ABSOLUT v1.2 is applied to predict the yields of silage maize, winter wheat, and other major crops in Germany based on two decades of data from about 300 districts. It turned out to be absolutely crucial to not only make out-of-sample predictions (solely based on data excluding the target year to predict) but to also consequently separate training and testing years in the process of feature selection. Otherwise, the prediction accuracy would be over-estimated by far. The question arises whether performances claimed for other statistical modelling examples are often upward-biased through input variable selection disregarding the out-of-sample principle. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00484-022-02356-5. |
---|