Cargando…

Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy

CONTEXT: Collection and analysis of large volumes of on-farm production data are widely seen as key to understanding yield variability among farmers and improving resource-use efficiency. OBJECTIVE: The aim of this study was to assess the performance of statistical and machine learning methods to ex...

Descripción completa

Detalles Bibliográficos
Autores principales: Silva, João Vasco, Heerwaarden, Joost van, Reidsma, Pytrik, Laborte, Alice G., Tesfaye, Kindie, Ittersum, Martin K. van
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Scientific Pub. Co 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10565834/
https://www.ncbi.nlm.nih.gov/pubmed/37840838
http://dx.doi.org/10.1016/j.fcr.2023.109063
_version_ 1785118782007541760
author Silva, João Vasco
Heerwaarden, Joost van
Reidsma, Pytrik
Laborte, Alice G.
Tesfaye, Kindie
Ittersum, Martin K. van
author_facet Silva, João Vasco
Heerwaarden, Joost van
Reidsma, Pytrik
Laborte, Alice G.
Tesfaye, Kindie
Ittersum, Martin K. van
author_sort Silva, João Vasco
collection PubMed
description CONTEXT: Collection and analysis of large volumes of on-farm production data are widely seen as key to understanding yield variability among farmers and improving resource-use efficiency. OBJECTIVE: The aim of this study was to assess the performance of statistical and machine learning methods to explain and predict crop yield across thousands of farmers’ fields in contrasting farming systems worldwide. METHODS: A large database of 10,940 field-year combinations from three countries in different stages of agricultural intensification was analyzed. Random effects models were used to partition crop yield variability and random forest models were used to explain and predict crop yield within a cross-validation scheme with data re-sampling over space and time. RESULTS: Yield variability in relative terms was smallest for wheat and barley in the Netherlands and for wheat in Ethiopia, intermediate for rice in the Philippines, and greatest for maize in Ethiopia. Random forest models comprising a total of 87 variables explained a maximum of 65 % of cereal yield variability in the Netherlands and less than 45 % of cereal yield variability in Ethiopia and in the Philippines. Crop management related variables were important to explain and predict cereal yields in Ethiopia, while predictive (i.e., known before the growing season) climatic variables and explanatory (i.e., known during or after the growing season) climatic variables were most important to explain and predict cereal yield variability in the Philippines and in the Netherlands, respectively. Finally, model cross-validation for regions or years not seen during model training reduced the R(2) considerably for most crop x country combinations, while for wheat in the Netherlands this was model dependent. CONCLUSION: Big data from farmers’ fields is useful to explain on-farm yield variability to some extent, but not to predict it across time and space. SIGNIFICANCE: The results call for moderate expectations towards big data and machine learning in agronomic studies, particularly for smallholder farms in the tropics where model performance was poorest independently of the variables considered and the cross-validation scheme used.
format Online
Article
Text
id pubmed-10565834
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier Scientific Pub. Co
record_format MEDLINE/PubMed
spelling pubmed-105658342023-10-15 Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy Silva, João Vasco Heerwaarden, Joost van Reidsma, Pytrik Laborte, Alice G. Tesfaye, Kindie Ittersum, Martin K. van Field Crops Res Article CONTEXT: Collection and analysis of large volumes of on-farm production data are widely seen as key to understanding yield variability among farmers and improving resource-use efficiency. OBJECTIVE: The aim of this study was to assess the performance of statistical and machine learning methods to explain and predict crop yield across thousands of farmers’ fields in contrasting farming systems worldwide. METHODS: A large database of 10,940 field-year combinations from three countries in different stages of agricultural intensification was analyzed. Random effects models were used to partition crop yield variability and random forest models were used to explain and predict crop yield within a cross-validation scheme with data re-sampling over space and time. RESULTS: Yield variability in relative terms was smallest for wheat and barley in the Netherlands and for wheat in Ethiopia, intermediate for rice in the Philippines, and greatest for maize in Ethiopia. Random forest models comprising a total of 87 variables explained a maximum of 65 % of cereal yield variability in the Netherlands and less than 45 % of cereal yield variability in Ethiopia and in the Philippines. Crop management related variables were important to explain and predict cereal yields in Ethiopia, while predictive (i.e., known before the growing season) climatic variables and explanatory (i.e., known during or after the growing season) climatic variables were most important to explain and predict cereal yield variability in the Philippines and in the Netherlands, respectively. Finally, model cross-validation for regions or years not seen during model training reduced the R(2) considerably for most crop x country combinations, while for wheat in the Netherlands this was model dependent. CONCLUSION: Big data from farmers’ fields is useful to explain on-farm yield variability to some extent, but not to predict it across time and space. SIGNIFICANCE: The results call for moderate expectations towards big data and machine learning in agronomic studies, particularly for smallholder farms in the tropics where model performance was poorest independently of the variables considered and the cross-validation scheme used. Elsevier Scientific Pub. Co 2023-10-15 /pmc/articles/PMC10565834/ /pubmed/37840838 http://dx.doi.org/10.1016/j.fcr.2023.109063 Text en © 2023 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Silva, João Vasco
Heerwaarden, Joost van
Reidsma, Pytrik
Laborte, Alice G.
Tesfaye, Kindie
Ittersum, Martin K. van
Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
title Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
title_full Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
title_fullStr Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
title_full_unstemmed Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
title_short Big data, small explanatory and predictive power: Lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
title_sort big data, small explanatory and predictive power: lessons from random forest modeling of on-farm yield variability and implications for data-driven agronomy
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10565834/
https://www.ncbi.nlm.nih.gov/pubmed/37840838
http://dx.doi.org/10.1016/j.fcr.2023.109063
work_keys_str_mv AT silvajoaovasco bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy
AT heerwaardenjoostvan bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy
AT reidsmapytrik bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy
AT labortealiceg bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy
AT tesfayekindie bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy
AT ittersummartinkvan bigdatasmallexplanatoryandpredictivepowerlessonsfromrandomforestmodelingofonfarmyieldvariabilityandimplicationsfordatadrivenagronomy