Cargando…

Succinylation Site Prediction Based on Protein Sequences Using the IFS-LightGBM (BO) Model

Succinylation is an important posttranslational modification of proteins, which plays a key role in protein conformation regulation and cellular function control. Many studies have shown that succinylation modification on protein lysine residue is closely related to the occurrence of many diseases....

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Lu, Liu, Min, Qin, Xinyi, Liu, Guangzhong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7673955/
https://www.ncbi.nlm.nih.gov/pubmed/33224267
http://dx.doi.org/10.1155/2020/8858489
_version_ 1783611422178017280
author Zhang, Lu
Liu, Min
Qin, Xinyi
Liu, Guangzhong
author_facet Zhang, Lu
Liu, Min
Qin, Xinyi
Liu, Guangzhong
author_sort Zhang, Lu
collection PubMed
description Succinylation is an important posttranslational modification of proteins, which plays a key role in protein conformation regulation and cellular function control. Many studies have shown that succinylation modification on protein lysine residue is closely related to the occurrence of many diseases. To understand the mechanism of succinylation profoundly, it is necessary to identify succinylation sites in proteins accurately. In this study, we develop a new model, IFS-LightGBM (BO), which utilizes the incremental feature selection (IFS) method, the LightGBM feature selection method, the Bayesian optimization algorithm, and the LightGBM classifier, to predict succinylation sites in proteins. Specifically, pseudo amino acid composition (PseAAC), position-specific scoring matrix (PSSM), disorder status, and Composition of k-spaced Amino Acid Pairs (CKSAAP) are firstly employed to extract feature information. Then, utilizing the combination of the LightGBM feature selection method and the incremental feature selection (IFS) method selects the optimal feature subset for the LightGBM classifier. Finally, to increase prediction accuracy and reduce the computation load, the Bayesian optimization algorithm is used to optimize the parameters of the LightGBM classifier. The results reveal that the IFS-LightGBM (BO)-based prediction model performs better when it is evaluated by some common metrics, such as accuracy, recall, precision, Matthews Correlation Coefficient (MCC), and F-measure.
format Online
Article
Text
id pubmed-7673955
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-76739552020-11-19 Succinylation Site Prediction Based on Protein Sequences Using the IFS-LightGBM (BO) Model Zhang, Lu Liu, Min Qin, Xinyi Liu, Guangzhong Comput Math Methods Med Research Article Succinylation is an important posttranslational modification of proteins, which plays a key role in protein conformation regulation and cellular function control. Many studies have shown that succinylation modification on protein lysine residue is closely related to the occurrence of many diseases. To understand the mechanism of succinylation profoundly, it is necessary to identify succinylation sites in proteins accurately. In this study, we develop a new model, IFS-LightGBM (BO), which utilizes the incremental feature selection (IFS) method, the LightGBM feature selection method, the Bayesian optimization algorithm, and the LightGBM classifier, to predict succinylation sites in proteins. Specifically, pseudo amino acid composition (PseAAC), position-specific scoring matrix (PSSM), disorder status, and Composition of k-spaced Amino Acid Pairs (CKSAAP) are firstly employed to extract feature information. Then, utilizing the combination of the LightGBM feature selection method and the incremental feature selection (IFS) method selects the optimal feature subset for the LightGBM classifier. Finally, to increase prediction accuracy and reduce the computation load, the Bayesian optimization algorithm is used to optimize the parameters of the LightGBM classifier. The results reveal that the IFS-LightGBM (BO)-based prediction model performs better when it is evaluated by some common metrics, such as accuracy, recall, precision, Matthews Correlation Coefficient (MCC), and F-measure. Hindawi 2020-11-10 /pmc/articles/PMC7673955/ /pubmed/33224267 http://dx.doi.org/10.1155/2020/8858489 Text en Copyright © 2020 Lu Zhang et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zhang, Lu
Liu, Min
Qin, Xinyi
Liu, Guangzhong
Succinylation Site Prediction Based on Protein Sequences Using the IFS-LightGBM (BO) Model
title Succinylation Site Prediction Based on Protein Sequences Using the IFS-LightGBM (BO) Model
title_full Succinylation Site Prediction Based on Protein Sequences Using the IFS-LightGBM (BO) Model
title_fullStr Succinylation Site Prediction Based on Protein Sequences Using the IFS-LightGBM (BO) Model
title_full_unstemmed Succinylation Site Prediction Based on Protein Sequences Using the IFS-LightGBM (BO) Model
title_short Succinylation Site Prediction Based on Protein Sequences Using the IFS-LightGBM (BO) Model
title_sort succinylation site prediction based on protein sequences using the ifs-lightgbm (bo) model
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7673955/
https://www.ncbi.nlm.nih.gov/pubmed/33224267
http://dx.doi.org/10.1155/2020/8858489
work_keys_str_mv AT zhanglu succinylationsitepredictionbasedonproteinsequencesusingtheifslightgbmbomodel
AT liumin succinylationsitepredictionbasedonproteinsequencesusingtheifslightgbmbomodel
AT qinxinyi succinylationsitepredictionbasedonproteinsequencesusingtheifslightgbmbomodel
AT liuguangzhong succinylationsitepredictionbasedonproteinsequencesusingtheifslightgbmbomodel