Cargando…

PredCRG: A computational method for recognition of plant circadian genes by employing support vector machine with Laplace kernel

BACKGROUND: Circadian rhythms regulate several physiological and developmental processes of plants. Hence, the identification of genes with the underlying circadian rhythmic features is pivotal. Though computational methods have been developed for the identification of circadian genes, all these met...

Descripción completa

Detalles Bibliográficos
Autores principales: Meher, Prabina Kumar, Mohapatra, Ansuman, Satpathy, Subhrajit, Sharma, Anuj, Saini, Isha, Pradhan, Sukanta Kumar, Rai, Anil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8074503/
https://www.ncbi.nlm.nih.gov/pubmed/33902670
http://dx.doi.org/10.1186/s13007-021-00744-3
_version_ 1783684366451343360
author Meher, Prabina Kumar
Mohapatra, Ansuman
Satpathy, Subhrajit
Sharma, Anuj
Saini, Isha
Pradhan, Sukanta Kumar
Rai, Anil
author_facet Meher, Prabina Kumar
Mohapatra, Ansuman
Satpathy, Subhrajit
Sharma, Anuj
Saini, Isha
Pradhan, Sukanta Kumar
Rai, Anil
author_sort Meher, Prabina Kumar
collection PubMed
description BACKGROUND: Circadian rhythms regulate several physiological and developmental processes of plants. Hence, the identification of genes with the underlying circadian rhythmic features is pivotal. Though computational methods have been developed for the identification of circadian genes, all these methods are based on gene expression datasets. In other words, we failed to search any sequence-based model, and that motivated us to deploy the present computational method to identify the proteins encoded by the circadian genes. RESULTS: Support vector machine (SVM) with seven kernels, i.e., linear, polynomial, radial, sigmoid, hyperbolic, Bessel and Laplace was utilized for prediction by employing compositional, transitional and physico-chemical features. Higher accuracy of 62.48% was achieved with the Laplace kernel, following the fivefold cross- validation approach. The developed model further secured 62.96% accuracy with an independent dataset. The SVM also outperformed other state-of-art machine learning algorithms, i.e., Random Forest, Bagging, AdaBoost, XGBoost and LASSO. We also performed proteome-wide identification of circadian proteins in two cereal crops namely, Oryza sativa and Sorghum bicolor, followed by the functional annotation of the predicted circadian proteins with Gene Ontology (GO) terms. CONCLUSIONS: To the best of our knowledge, this is the first computational method to identify the circadian genes with the sequence data. Based on the proposed method, we have developed an R-package PredCRG (https://cran.r-project.org/web/packages/PredCRG/index.html) for the scientific community for proteome-wide identification of circadian genes. The present study supplements the existing computational methods as well as wet-lab experiments for the recognition of circadian genes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13007-021-00744-3.
format Online
Article
Text
id pubmed-8074503
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-80745032021-04-26 PredCRG: A computational method for recognition of plant circadian genes by employing support vector machine with Laplace kernel Meher, Prabina Kumar Mohapatra, Ansuman Satpathy, Subhrajit Sharma, Anuj Saini, Isha Pradhan, Sukanta Kumar Rai, Anil Plant Methods Methodology BACKGROUND: Circadian rhythms regulate several physiological and developmental processes of plants. Hence, the identification of genes with the underlying circadian rhythmic features is pivotal. Though computational methods have been developed for the identification of circadian genes, all these methods are based on gene expression datasets. In other words, we failed to search any sequence-based model, and that motivated us to deploy the present computational method to identify the proteins encoded by the circadian genes. RESULTS: Support vector machine (SVM) with seven kernels, i.e., linear, polynomial, radial, sigmoid, hyperbolic, Bessel and Laplace was utilized for prediction by employing compositional, transitional and physico-chemical features. Higher accuracy of 62.48% was achieved with the Laplace kernel, following the fivefold cross- validation approach. The developed model further secured 62.96% accuracy with an independent dataset. The SVM also outperformed other state-of-art machine learning algorithms, i.e., Random Forest, Bagging, AdaBoost, XGBoost and LASSO. We also performed proteome-wide identification of circadian proteins in two cereal crops namely, Oryza sativa and Sorghum bicolor, followed by the functional annotation of the predicted circadian proteins with Gene Ontology (GO) terms. CONCLUSIONS: To the best of our knowledge, this is the first computational method to identify the circadian genes with the sequence data. Based on the proposed method, we have developed an R-package PredCRG (https://cran.r-project.org/web/packages/PredCRG/index.html) for the scientific community for proteome-wide identification of circadian genes. The present study supplements the existing computational methods as well as wet-lab experiments for the recognition of circadian genes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13007-021-00744-3. BioMed Central 2021-04-26 /pmc/articles/PMC8074503/ /pubmed/33902670 http://dx.doi.org/10.1186/s13007-021-00744-3 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology
Meher, Prabina Kumar
Mohapatra, Ansuman
Satpathy, Subhrajit
Sharma, Anuj
Saini, Isha
Pradhan, Sukanta Kumar
Rai, Anil
PredCRG: A computational method for recognition of plant circadian genes by employing support vector machine with Laplace kernel
title PredCRG: A computational method for recognition of plant circadian genes by employing support vector machine with Laplace kernel
title_full PredCRG: A computational method for recognition of plant circadian genes by employing support vector machine with Laplace kernel
title_fullStr PredCRG: A computational method for recognition of plant circadian genes by employing support vector machine with Laplace kernel
title_full_unstemmed PredCRG: A computational method for recognition of plant circadian genes by employing support vector machine with Laplace kernel
title_short PredCRG: A computational method for recognition of plant circadian genes by employing support vector machine with Laplace kernel
title_sort predcrg: a computational method for recognition of plant circadian genes by employing support vector machine with laplace kernel
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8074503/
https://www.ncbi.nlm.nih.gov/pubmed/33902670
http://dx.doi.org/10.1186/s13007-021-00744-3
work_keys_str_mv AT meherprabinakumar predcrgacomputationalmethodforrecognitionofplantcircadiangenesbyemployingsupportvectormachinewithlaplacekernel
AT mohapatraansuman predcrgacomputationalmethodforrecognitionofplantcircadiangenesbyemployingsupportvectormachinewithlaplacekernel
AT satpathysubhrajit predcrgacomputationalmethodforrecognitionofplantcircadiangenesbyemployingsupportvectormachinewithlaplacekernel
AT sharmaanuj predcrgacomputationalmethodforrecognitionofplantcircadiangenesbyemployingsupportvectormachinewithlaplacekernel
AT sainiisha predcrgacomputationalmethodforrecognitionofplantcircadiangenesbyemployingsupportvectormachinewithlaplacekernel
AT pradhansukantakumar predcrgacomputationalmethodforrecognitionofplantcircadiangenesbyemployingsupportvectormachinewithlaplacekernel
AT raianil predcrgacomputationalmethodforrecognitionofplantcircadiangenesbyemployingsupportvectormachinewithlaplacekernel