Cargando…
pSuc-FFSEA: Predicting Lysine Succinylation Sites in Proteins Based on Feature Fusion and Stacking Ensemble Algorithm
Being a new type of widespread protein post-translational modifications discovered in recent years, succinylation plays a key role in protein conformational regulation and cellular function regulation. Numerous studies have shown that succinylation modifications are closely associated with the devel...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9170990/ https://www.ncbi.nlm.nih.gov/pubmed/35686053 http://dx.doi.org/10.3389/fcell.2022.894874 |
_version_ | 1784721559806541824 |
---|---|
author | Jia, Jianhua Wu, Genqiang Qiu, Wangren |
author_facet | Jia, Jianhua Wu, Genqiang Qiu, Wangren |
author_sort | Jia, Jianhua |
collection | PubMed |
description | Being a new type of widespread protein post-translational modifications discovered in recent years, succinylation plays a key role in protein conformational regulation and cellular function regulation. Numerous studies have shown that succinylation modifications are closely associated with the development of many diseases. In order to gain insight into the mechanism of succinylation, it is vital to identify lysine succinylation sites. However, experimental identification of succinylation sites is time-consuming and laborious, and traditional identification tools are unable to meet the rapid growth of datasets. Therefore, to solve this problem, we developed a new predictor named pSuc-FFSEA, which can predict succinylation sites in protein sequences by feature fusion and stacking ensemble algorithm. Specifically, the sequence information and physicochemical properties were first extracted using EBGW, One-Hot, continuous bag-of-words, chaos game representation, and AAF_DWT. Following that, feature selection was performed, which applied LASSO to select the optimal subset of features for the classifier, and then, stacking ensemble classifier was designed using two-layer stacking ensemble, selecting three classifiers, SVM, broad learning system and LightGBM classifier, as the base classifiers of the first layer, using logistic regression classifier as the meta classifier of the second layer. In order to further improve the model prediction accuracy and reduce the computational effort, bayesian optimization algorithm and grid search algorithm were utilized to optimize the hyperparameters of the classifier. Finally, the results of rigorous 10-fold cross-validation indicated our predictor showed excellent robustness and performed better than the previous prediction tools, which achieved an average prediction accuracy of 0.7773 ± 0.0120. Besides, for the convenience of the most experimental scientists, a user-friendly and comprehensive web-server for pSuc-FFSEA has been established at https://bio.cangmang.xyz/pSuc-FFSEA, by which one can easily obtain the expected data and results without going through the complicated mathematics. |
format | Online Article Text |
id | pubmed-9170990 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-91709902022-06-08 pSuc-FFSEA: Predicting Lysine Succinylation Sites in Proteins Based on Feature Fusion and Stacking Ensemble Algorithm Jia, Jianhua Wu, Genqiang Qiu, Wangren Front Cell Dev Biol Cell and Developmental Biology Being a new type of widespread protein post-translational modifications discovered in recent years, succinylation plays a key role in protein conformational regulation and cellular function regulation. Numerous studies have shown that succinylation modifications are closely associated with the development of many diseases. In order to gain insight into the mechanism of succinylation, it is vital to identify lysine succinylation sites. However, experimental identification of succinylation sites is time-consuming and laborious, and traditional identification tools are unable to meet the rapid growth of datasets. Therefore, to solve this problem, we developed a new predictor named pSuc-FFSEA, which can predict succinylation sites in protein sequences by feature fusion and stacking ensemble algorithm. Specifically, the sequence information and physicochemical properties were first extracted using EBGW, One-Hot, continuous bag-of-words, chaos game representation, and AAF_DWT. Following that, feature selection was performed, which applied LASSO to select the optimal subset of features for the classifier, and then, stacking ensemble classifier was designed using two-layer stacking ensemble, selecting three classifiers, SVM, broad learning system and LightGBM classifier, as the base classifiers of the first layer, using logistic regression classifier as the meta classifier of the second layer. In order to further improve the model prediction accuracy and reduce the computational effort, bayesian optimization algorithm and grid search algorithm were utilized to optimize the hyperparameters of the classifier. Finally, the results of rigorous 10-fold cross-validation indicated our predictor showed excellent robustness and performed better than the previous prediction tools, which achieved an average prediction accuracy of 0.7773 ± 0.0120. Besides, for the convenience of the most experimental scientists, a user-friendly and comprehensive web-server for pSuc-FFSEA has been established at https://bio.cangmang.xyz/pSuc-FFSEA, by which one can easily obtain the expected data and results without going through the complicated mathematics. Frontiers Media S.A. 2022-05-24 /pmc/articles/PMC9170990/ /pubmed/35686053 http://dx.doi.org/10.3389/fcell.2022.894874 Text en Copyright © 2022 Jia, Wu and Qiu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Cell and Developmental Biology Jia, Jianhua Wu, Genqiang Qiu, Wangren pSuc-FFSEA: Predicting Lysine Succinylation Sites in Proteins Based on Feature Fusion and Stacking Ensemble Algorithm |
title | pSuc-FFSEA: Predicting Lysine Succinylation Sites in Proteins Based on Feature Fusion and Stacking Ensemble Algorithm |
title_full | pSuc-FFSEA: Predicting Lysine Succinylation Sites in Proteins Based on Feature Fusion and Stacking Ensemble Algorithm |
title_fullStr | pSuc-FFSEA: Predicting Lysine Succinylation Sites in Proteins Based on Feature Fusion and Stacking Ensemble Algorithm |
title_full_unstemmed | pSuc-FFSEA: Predicting Lysine Succinylation Sites in Proteins Based on Feature Fusion and Stacking Ensemble Algorithm |
title_short | pSuc-FFSEA: Predicting Lysine Succinylation Sites in Proteins Based on Feature Fusion and Stacking Ensemble Algorithm |
title_sort | psuc-ffsea: predicting lysine succinylation sites in proteins based on feature fusion and stacking ensemble algorithm |
topic | Cell and Developmental Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9170990/ https://www.ncbi.nlm.nih.gov/pubmed/35686053 http://dx.doi.org/10.3389/fcell.2022.894874 |
work_keys_str_mv | AT jiajianhua psucffseapredictinglysinesuccinylationsitesinproteinsbasedonfeaturefusionandstackingensemblealgorithm AT wugenqiang psucffseapredictinglysinesuccinylationsitesinproteinsbasedonfeaturefusionandstackingensemblealgorithm AT qiuwangren psucffseapredictinglysinesuccinylationsitesinproteinsbasedonfeaturefusionandstackingensemblealgorithm |