Cargando…
Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks
The development of crop varieties with stable performance in future environmental conditions represents a critical challenge in the context of climate change. Environmental data collected at the field level, such as soil and climatic information, can be relevant to improve predictive ability in geno...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8647909/ https://www.ncbi.nlm.nih.gov/pubmed/34880880 http://dx.doi.org/10.3389/fpls.2021.699589 |
_version_ | 1784610693240061952 |
---|---|
author | Westhues, Cathy C. Mahone, Gregory S. da Silva, Sofia Thorwarth, Patrick Schmidt, Malthe Richter, Jan-Christoph Simianer, Henner Beissinger, Timothy M. |
author_facet | Westhues, Cathy C. Mahone, Gregory S. da Silva, Sofia Thorwarth, Patrick Schmidt, Malthe Richter, Jan-Christoph Simianer, Henner Beissinger, Timothy M. |
author_sort | Westhues, Cathy C. |
collection | PubMed |
description | The development of crop varieties with stable performance in future environmental conditions represents a critical challenge in the context of climate change. Environmental data collected at the field level, such as soil and climatic information, can be relevant to improve predictive ability in genomic prediction models by describing more precisely genotype-by-environment interactions, which represent a key component of the phenotypic response for complex crop agronomic traits. Modern predictive modeling approaches can efficiently handle various data types and are able to capture complex nonlinear relationships in large datasets. In particular, machine learning techniques have gained substantial interest in recent years. Here we examined the predictive ability of machine learning-based models for two phenotypic traits in maize using data collected by the Maize Genomes to Fields (G2F) Initiative. The data we analyzed consisted of multi-environment trials (METs) dispersed across the United States and Canada from 2014 to 2017. An assortment of soil- and weather-related variables was derived and used in prediction models alongside genotypic data. Linear random effects models were compared to a linear regularized regression method (elastic net) and to two nonlinear gradient boosting methods based on decision tree algorithms (XGBoost, LightGBM). These models were evaluated under four prediction problems: (1) tested and new genotypes in a new year; (2) only unobserved genotypes in a new year; (3) tested and new genotypes in a new site; (4) only unobserved genotypes in a new site. Accuracy in forecasting grain yield performance of new genotypes in a new year was improved by up to 20% over the baseline model by including environmental predictors with gradient boosting methods. For plant height, an enhancement of predictive ability could neither be observed by using machine learning-based methods nor by using detailed environmental information. An investigation of key environmental factors using gradient boosting frameworks also revealed that temperature at flowering stage, frequency and amount of water received during the vegetative and grain filling stage, and soil organic matter content appeared as important predictors for grain yield in our panel of environments. |
format | Online Article Text |
id | pubmed-8647909 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-86479092021-12-07 Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks Westhues, Cathy C. Mahone, Gregory S. da Silva, Sofia Thorwarth, Patrick Schmidt, Malthe Richter, Jan-Christoph Simianer, Henner Beissinger, Timothy M. Front Plant Sci Plant Science The development of crop varieties with stable performance in future environmental conditions represents a critical challenge in the context of climate change. Environmental data collected at the field level, such as soil and climatic information, can be relevant to improve predictive ability in genomic prediction models by describing more precisely genotype-by-environment interactions, which represent a key component of the phenotypic response for complex crop agronomic traits. Modern predictive modeling approaches can efficiently handle various data types and are able to capture complex nonlinear relationships in large datasets. In particular, machine learning techniques have gained substantial interest in recent years. Here we examined the predictive ability of machine learning-based models for two phenotypic traits in maize using data collected by the Maize Genomes to Fields (G2F) Initiative. The data we analyzed consisted of multi-environment trials (METs) dispersed across the United States and Canada from 2014 to 2017. An assortment of soil- and weather-related variables was derived and used in prediction models alongside genotypic data. Linear random effects models were compared to a linear regularized regression method (elastic net) and to two nonlinear gradient boosting methods based on decision tree algorithms (XGBoost, LightGBM). These models were evaluated under four prediction problems: (1) tested and new genotypes in a new year; (2) only unobserved genotypes in a new year; (3) tested and new genotypes in a new site; (4) only unobserved genotypes in a new site. Accuracy in forecasting grain yield performance of new genotypes in a new year was improved by up to 20% over the baseline model by including environmental predictors with gradient boosting methods. For plant height, an enhancement of predictive ability could neither be observed by using machine learning-based methods nor by using detailed environmental information. An investigation of key environmental factors using gradient boosting frameworks also revealed that temperature at flowering stage, frequency and amount of water received during the vegetative and grain filling stage, and soil organic matter content appeared as important predictors for grain yield in our panel of environments. Frontiers Media S.A. 2021-11-11 /pmc/articles/PMC8647909/ /pubmed/34880880 http://dx.doi.org/10.3389/fpls.2021.699589 Text en Copyright © 2021 Westhues, Mahone, da Silva, Thorwarth, Schmidt, Richter, Simianer and Beissinger. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Plant Science Westhues, Cathy C. Mahone, Gregory S. da Silva, Sofia Thorwarth, Patrick Schmidt, Malthe Richter, Jan-Christoph Simianer, Henner Beissinger, Timothy M. Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks |
title | Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks |
title_full | Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks |
title_fullStr | Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks |
title_full_unstemmed | Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks |
title_short | Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks |
title_sort | prediction of maize phenotypic traits with genomic and environmental predictors using gradient boosting frameworks |
topic | Plant Science |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8647909/ https://www.ncbi.nlm.nih.gov/pubmed/34880880 http://dx.doi.org/10.3389/fpls.2021.699589 |
work_keys_str_mv | AT westhuescathyc predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks AT mahonegregorys predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks AT dasilvasofia predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks AT thorwarthpatrick predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks AT schmidtmalthe predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks AT richterjanchristoph predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks AT simianerhenner predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks AT beissingertimothym predictionofmaizephenotypictraitswithgenomicandenvironmentalpredictorsusinggradientboostingframeworks |