Cargando…

Identification of cyclin protein using gradient boost decision tree algorithm

Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor predicti...

Descripción completa

Detalles Bibliográficos
Autores principales: Zulfiqar, Hasan, Yuan, Shi-Shi, Huang, Qin-Lai, Sun, Zi-Jie, Dao, Fu-Ying, Yu, Xiao-Long, Lin, Hao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8346528/
https://www.ncbi.nlm.nih.gov/pubmed/34527186
http://dx.doi.org/10.1016/j.csbj.2021.07.013
_version_ 1783734894024720384
author Zulfiqar, Hasan
Yuan, Shi-Shi
Huang, Qin-Lai
Sun, Zi-Jie
Dao, Fu-Ying
Yu, Xiao-Long
Lin, Hao
author_facet Zulfiqar, Hasan
Yuan, Shi-Shi
Huang, Qin-Lai
Sun, Zi-Jie
Dao, Fu-Ying
Yu, Xiao-Long
Lin, Hao
author_sort Zulfiqar, Hasan
collection PubMed
description Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor prediction for sequence similarity-based methods. Thus, it is urgent to construct a machine learning model to identify cyclin proteins. This study aimed to develop a computational model to discriminate cyclin proteins from non-cyclin proteins. In our model, protein sequences were encoded by seven kinds of features that are amino acid composition, composition of k-spaced amino acid pairs, tri peptide composition, pseudo amino acid composition, geary correlation, normalized moreau-broto autocorrelation and composition/transition/distribution. Afterward, these features were optimized by using analysis of variance (ANOVA) and minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) technique. A gradient boost decision tree (GBDT) classifier was trained on the optimal features. Five-fold cross-validated results showed that our model would identify cyclins with an accuracy of 93.06% and AUC value of 0.971, which are higher than the two recent studies on the same data.
format Online
Article
Text
id pubmed-8346528
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-83465282021-09-14 Identification of cyclin protein using gradient boost decision tree algorithm Zulfiqar, Hasan Yuan, Shi-Shi Huang, Qin-Lai Sun, Zi-Jie Dao, Fu-Ying Yu, Xiao-Long Lin, Hao Comput Struct Biotechnol J Research Article Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor prediction for sequence similarity-based methods. Thus, it is urgent to construct a machine learning model to identify cyclin proteins. This study aimed to develop a computational model to discriminate cyclin proteins from non-cyclin proteins. In our model, protein sequences were encoded by seven kinds of features that are amino acid composition, composition of k-spaced amino acid pairs, tri peptide composition, pseudo amino acid composition, geary correlation, normalized moreau-broto autocorrelation and composition/transition/distribution. Afterward, these features were optimized by using analysis of variance (ANOVA) and minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) technique. A gradient boost decision tree (GBDT) classifier was trained on the optimal features. Five-fold cross-validated results showed that our model would identify cyclins with an accuracy of 93.06% and AUC value of 0.971, which are higher than the two recent studies on the same data. Research Network of Computational and Structural Biotechnology 2021-07-19 /pmc/articles/PMC8346528/ /pubmed/34527186 http://dx.doi.org/10.1016/j.csbj.2021.07.013 Text en © 2021 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Zulfiqar, Hasan
Yuan, Shi-Shi
Huang, Qin-Lai
Sun, Zi-Jie
Dao, Fu-Ying
Yu, Xiao-Long
Lin, Hao
Identification of cyclin protein using gradient boost decision tree algorithm
title Identification of cyclin protein using gradient boost decision tree algorithm
title_full Identification of cyclin protein using gradient boost decision tree algorithm
title_fullStr Identification of cyclin protein using gradient boost decision tree algorithm
title_full_unstemmed Identification of cyclin protein using gradient boost decision tree algorithm
title_short Identification of cyclin protein using gradient boost decision tree algorithm
title_sort identification of cyclin protein using gradient boost decision tree algorithm
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8346528/
https://www.ncbi.nlm.nih.gov/pubmed/34527186
http://dx.doi.org/10.1016/j.csbj.2021.07.013
work_keys_str_mv AT zulfiqarhasan identificationofcyclinproteinusinggradientboostdecisiontreealgorithm
AT yuanshishi identificationofcyclinproteinusinggradientboostdecisiontreealgorithm
AT huangqinlai identificationofcyclinproteinusinggradientboostdecisiontreealgorithm
AT sunzijie identificationofcyclinproteinusinggradientboostdecisiontreealgorithm
AT daofuying identificationofcyclinproteinusinggradientboostdecisiontreealgorithm
AT yuxiaolong identificationofcyclinproteinusinggradientboostdecisiontreealgorithm
AT linhao identificationofcyclinproteinusinggradientboostdecisiontreealgorithm