Cargando…

Machine learning models for net photosynthetic rate prediction using poplar leaf phenotype data

BACKGROUND: As an essential component in reducing anthropogenic CO(2) emissions to the atmosphere, tree planting is the key to keeping carbon dioxide emissions under control. In 1992, the United Nations agreed to take action at the Earth Summit to stabilize and reduce net zero global anthropogenic C...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Xiao-Yu, Huang, Ziyuan, Su, Xuehui, Siu, Andrew, Song, Yuepeng, Zhang, Deqiang, Fang, Qing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7012418/
https://www.ncbi.nlm.nih.gov/pubmed/32045452
http://dx.doi.org/10.1371/journal.pone.0228645
_version_ 1783496229376753664
author Zhang, Xiao-Yu
Huang, Ziyuan
Su, Xuehui
Siu, Andrew
Song, Yuepeng
Zhang, Deqiang
Fang, Qing
author_facet Zhang, Xiao-Yu
Huang, Ziyuan
Su, Xuehui
Siu, Andrew
Song, Yuepeng
Zhang, Deqiang
Fang, Qing
author_sort Zhang, Xiao-Yu
collection PubMed
description BACKGROUND: As an essential component in reducing anthropogenic CO(2) emissions to the atmosphere, tree planting is the key to keeping carbon dioxide emissions under control. In 1992, the United Nations agreed to take action at the Earth Summit to stabilize and reduce net zero global anthropogenic CO(2) emissions. Tree planting was identified as an effective method to offset CO(2) emissions. A high net photosynthetic rate (Pn) with fast-growing trees could efficiently fulfill the goal of CO(2) emission reduction. Net photosynthetic rate model can provide refernece for plant’s stability of photosynthesis productivity. METHODS AND RESULTS: Using leaf phenotype data to predict the Pn can help effectively guide tree planting policies to offset CO(2) release into the atmosphere. Tree planting has been proposed as one climate change solution. One of the most popular trees to plant are poplars. This study used a Populus simonii (P. simonii) dataset collected from 23 artificial forests in northern China. The samples represent almost the entire geographic distribution of P. simonii. The geographic locations of these P. simonii trees cover most of the major provinces of northern China. The northwestern point reaches (36°30’N, 98°09’E). The northeastern point reaches (40°91’N, 115°83’E). The southwestern point reaches (32°31’N, 108°90’E). The southeastern point reaches (34°39’N, 113°74’E). The collected data on leaf phenotypic traits are sparse, noisy, and highly correlated. The photosynthetic rate data are nonnormal and skewed. Many machine learning algorithms can produce reasonably accurate predictions despite these data issues. Influential outliers are removed to allow an accurate and precise prediction, and cluster analysis is implemented as part of a data exploratory analysis to investigate further details in the dataset. We select four regression methods, extreme gradient boosting (XGBoost), support vector machine (SVM), random forest (RF) and generalized additive model (GAM), which are suitable to use on the dataset given in this study. Cross-validation and regularization mechanisms are implemented in the XGBoost, SVM, RF, and GAM algorithms to ensure the validity of the outputs. CONCLUSIONS: The best-performing approach is XGBoost, which generates a net photosynthetic rate prediction that has a 0.77 correlation with the actual rates. Moreover, the root mean square error (RMSE) is 2.57, which is approximately 35 percent smaller than the standard deviation of 3.97. The other metrics, i.e., the MAE, R(2), and the min-max accuracy are 1.12, 0.60, and 0.93, respectively. This study demonstrates the ability of machine learning models to use noisy leaf phenotype data to predict the net photosynthetic rate with significant accuracy. Most net photosynthetic rate prediction studies are conducted on herbaceous plants. The net photosynthetic rate prediction of P. simonii, a kind of woody plant, illustrates significant guidance for plant science or environmental science regarding the predictive relationship between leaf phenotypic characteristics and the Pn for woody plants in northern China.
format Online
Article
Text
id pubmed-7012418
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-70124182020-02-21 Machine learning models for net photosynthetic rate prediction using poplar leaf phenotype data Zhang, Xiao-Yu Huang, Ziyuan Su, Xuehui Siu, Andrew Song, Yuepeng Zhang, Deqiang Fang, Qing PLoS One Research Article BACKGROUND: As an essential component in reducing anthropogenic CO(2) emissions to the atmosphere, tree planting is the key to keeping carbon dioxide emissions under control. In 1992, the United Nations agreed to take action at the Earth Summit to stabilize and reduce net zero global anthropogenic CO(2) emissions. Tree planting was identified as an effective method to offset CO(2) emissions. A high net photosynthetic rate (Pn) with fast-growing trees could efficiently fulfill the goal of CO(2) emission reduction. Net photosynthetic rate model can provide refernece for plant’s stability of photosynthesis productivity. METHODS AND RESULTS: Using leaf phenotype data to predict the Pn can help effectively guide tree planting policies to offset CO(2) release into the atmosphere. Tree planting has been proposed as one climate change solution. One of the most popular trees to plant are poplars. This study used a Populus simonii (P. simonii) dataset collected from 23 artificial forests in northern China. The samples represent almost the entire geographic distribution of P. simonii. The geographic locations of these P. simonii trees cover most of the major provinces of northern China. The northwestern point reaches (36°30’N, 98°09’E). The northeastern point reaches (40°91’N, 115°83’E). The southwestern point reaches (32°31’N, 108°90’E). The southeastern point reaches (34°39’N, 113°74’E). The collected data on leaf phenotypic traits are sparse, noisy, and highly correlated. The photosynthetic rate data are nonnormal and skewed. Many machine learning algorithms can produce reasonably accurate predictions despite these data issues. Influential outliers are removed to allow an accurate and precise prediction, and cluster analysis is implemented as part of a data exploratory analysis to investigate further details in the dataset. We select four regression methods, extreme gradient boosting (XGBoost), support vector machine (SVM), random forest (RF) and generalized additive model (GAM), which are suitable to use on the dataset given in this study. Cross-validation and regularization mechanisms are implemented in the XGBoost, SVM, RF, and GAM algorithms to ensure the validity of the outputs. CONCLUSIONS: The best-performing approach is XGBoost, which generates a net photosynthetic rate prediction that has a 0.77 correlation with the actual rates. Moreover, the root mean square error (RMSE) is 2.57, which is approximately 35 percent smaller than the standard deviation of 3.97. The other metrics, i.e., the MAE, R(2), and the min-max accuracy are 1.12, 0.60, and 0.93, respectively. This study demonstrates the ability of machine learning models to use noisy leaf phenotype data to predict the net photosynthetic rate with significant accuracy. Most net photosynthetic rate prediction studies are conducted on herbaceous plants. The net photosynthetic rate prediction of P. simonii, a kind of woody plant, illustrates significant guidance for plant science or environmental science regarding the predictive relationship between leaf phenotypic characteristics and the Pn for woody plants in northern China. Public Library of Science 2020-02-11 /pmc/articles/PMC7012418/ /pubmed/32045452 http://dx.doi.org/10.1371/journal.pone.0228645 Text en © 2020 Zhang et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zhang, Xiao-Yu
Huang, Ziyuan
Su, Xuehui
Siu, Andrew
Song, Yuepeng
Zhang, Deqiang
Fang, Qing
Machine learning models for net photosynthetic rate prediction using poplar leaf phenotype data
title Machine learning models for net photosynthetic rate prediction using poplar leaf phenotype data
title_full Machine learning models for net photosynthetic rate prediction using poplar leaf phenotype data
title_fullStr Machine learning models for net photosynthetic rate prediction using poplar leaf phenotype data
title_full_unstemmed Machine learning models for net photosynthetic rate prediction using poplar leaf phenotype data
title_short Machine learning models for net photosynthetic rate prediction using poplar leaf phenotype data
title_sort machine learning models for net photosynthetic rate prediction using poplar leaf phenotype data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7012418/
https://www.ncbi.nlm.nih.gov/pubmed/32045452
http://dx.doi.org/10.1371/journal.pone.0228645
work_keys_str_mv AT zhangxiaoyu machinelearningmodelsfornetphotosyntheticratepredictionusingpoplarleafphenotypedata
AT huangziyuan machinelearningmodelsfornetphotosyntheticratepredictionusingpoplarleafphenotypedata
AT suxuehui machinelearningmodelsfornetphotosyntheticratepredictionusingpoplarleafphenotypedata
AT siuandrew machinelearningmodelsfornetphotosyntheticratepredictionusingpoplarleafphenotypedata
AT songyuepeng machinelearningmodelsfornetphotosyntheticratepredictionusingpoplarleafphenotypedata
AT zhangdeqiang machinelearningmodelsfornetphotosyntheticratepredictionusingpoplarleafphenotypedata
AT fangqing machinelearningmodelsfornetphotosyntheticratepredictionusingpoplarleafphenotypedata