Cargando…
PGD: a machine learning-based photosynthetic-related gene detection approach
BACKGROUND: The primary determinant of crop yield is photosynthetic capacity, which is under the control of photosynthesis-related genes. Therefore, the mining of genes involved in photosynthesis is important for the study of photosynthesis. MapMan Mercator 4 is a powerful annotation tool for assign...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9112524/ https://www.ncbi.nlm.nih.gov/pubmed/35581553 http://dx.doi.org/10.1186/s12859-022-04722-x |
_version_ | 1784709429000667136 |
---|---|
author | Wang, Yunchuan Dai, Xiuru Fu, Daohong Li, Pinghua Du, Baijuan |
author_facet | Wang, Yunchuan Dai, Xiuru Fu, Daohong Li, Pinghua Du, Baijuan |
author_sort | Wang, Yunchuan |
collection | PubMed |
description | BACKGROUND: The primary determinant of crop yield is photosynthetic capacity, which is under the control of photosynthesis-related genes. Therefore, the mining of genes involved in photosynthesis is important for the study of photosynthesis. MapMan Mercator 4 is a powerful annotation tool for assigning genes into proper functional categories; however, in maize, the functions of approximately 22.15% (9520) of genes remain unclear and are labeled “not assigned”, which may include photosynthesis-related genes that have not yet been identified. The fast-increasing usage of the machine learning approach in solving biological problems provides us with a new chance to identify novel photosynthetic genes from functional “not assigned” genes in maize. RESULTS: In this study, we proved the ensemble learning model using a voting eliminates the preferences of single machine learning models. Based on this evaluation, we implemented an ensemble based ML(Machine Learning) methods using a majority voting scheme and observed that including RNA-seq data from multiple photosynthetic mutants rather than only a single mutant could increase prediction accuracy. And we call this approach “A Machine Learning-based Photosynthetic-related Gene Detection approach (PGD)”. Finally, we predicted 716 photosynthesis-related genes from the “not assigned” category of maize MapMan annotation. The protein localization prediction (TargetP) and expression trends of these genes from maize leaf sections indicated that the prediction was reliable and robust. And we put this approach online base on google colab. CONCLUSIONS: This study reveals a new approach for mining novel genes related to a specific functional category and provides candidate genes for researchers to experimentally define their biological functions. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04722-x. |
format | Online Article Text |
id | pubmed-9112524 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-91125242022-05-18 PGD: a machine learning-based photosynthetic-related gene detection approach Wang, Yunchuan Dai, Xiuru Fu, Daohong Li, Pinghua Du, Baijuan BMC Bioinformatics Research BACKGROUND: The primary determinant of crop yield is photosynthetic capacity, which is under the control of photosynthesis-related genes. Therefore, the mining of genes involved in photosynthesis is important for the study of photosynthesis. MapMan Mercator 4 is a powerful annotation tool for assigning genes into proper functional categories; however, in maize, the functions of approximately 22.15% (9520) of genes remain unclear and are labeled “not assigned”, which may include photosynthesis-related genes that have not yet been identified. The fast-increasing usage of the machine learning approach in solving biological problems provides us with a new chance to identify novel photosynthetic genes from functional “not assigned” genes in maize. RESULTS: In this study, we proved the ensemble learning model using a voting eliminates the preferences of single machine learning models. Based on this evaluation, we implemented an ensemble based ML(Machine Learning) methods using a majority voting scheme and observed that including RNA-seq data from multiple photosynthetic mutants rather than only a single mutant could increase prediction accuracy. And we call this approach “A Machine Learning-based Photosynthetic-related Gene Detection approach (PGD)”. Finally, we predicted 716 photosynthesis-related genes from the “not assigned” category of maize MapMan annotation. The protein localization prediction (TargetP) and expression trends of these genes from maize leaf sections indicated that the prediction was reliable and robust. And we put this approach online base on google colab. CONCLUSIONS: This study reveals a new approach for mining novel genes related to a specific functional category and provides candidate genes for researchers to experimentally define their biological functions. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04722-x. BioMed Central 2022-05-17 /pmc/articles/PMC9112524/ /pubmed/35581553 http://dx.doi.org/10.1186/s12859-022-04722-x Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Wang, Yunchuan Dai, Xiuru Fu, Daohong Li, Pinghua Du, Baijuan PGD: a machine learning-based photosynthetic-related gene detection approach |
title | PGD: a machine learning-based photosynthetic-related gene detection approach |
title_full | PGD: a machine learning-based photosynthetic-related gene detection approach |
title_fullStr | PGD: a machine learning-based photosynthetic-related gene detection approach |
title_full_unstemmed | PGD: a machine learning-based photosynthetic-related gene detection approach |
title_short | PGD: a machine learning-based photosynthetic-related gene detection approach |
title_sort | pgd: a machine learning-based photosynthetic-related gene detection approach |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9112524/ https://www.ncbi.nlm.nih.gov/pubmed/35581553 http://dx.doi.org/10.1186/s12859-022-04722-x |
work_keys_str_mv | AT wangyunchuan pgdamachinelearningbasedphotosyntheticrelatedgenedetectionapproach AT daixiuru pgdamachinelearningbasedphotosyntheticrelatedgenedetectionapproach AT fudaohong pgdamachinelearningbasedphotosyntheticrelatedgenedetectionapproach AT lipinghua pgdamachinelearningbasedphotosyntheticrelatedgenedetectionapproach AT dubaijuan pgdamachinelearningbasedphotosyntheticrelatedgenedetectionapproach |