Cargando…
Identification of cyclin protein using gradient boost decision tree algorithm
Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor predicti...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8346528/ https://www.ncbi.nlm.nih.gov/pubmed/34527186 http://dx.doi.org/10.1016/j.csbj.2021.07.013 |
_version_ | 1783734894024720384 |
---|---|
author | Zulfiqar, Hasan Yuan, Shi-Shi Huang, Qin-Lai Sun, Zi-Jie Dao, Fu-Ying Yu, Xiao-Long Lin, Hao |
author_facet | Zulfiqar, Hasan Yuan, Shi-Shi Huang, Qin-Lai Sun, Zi-Jie Dao, Fu-Ying Yu, Xiao-Long Lin, Hao |
author_sort | Zulfiqar, Hasan |
collection | PubMed |
description | Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor prediction for sequence similarity-based methods. Thus, it is urgent to construct a machine learning model to identify cyclin proteins. This study aimed to develop a computational model to discriminate cyclin proteins from non-cyclin proteins. In our model, protein sequences were encoded by seven kinds of features that are amino acid composition, composition of k-spaced amino acid pairs, tri peptide composition, pseudo amino acid composition, geary correlation, normalized moreau-broto autocorrelation and composition/transition/distribution. Afterward, these features were optimized by using analysis of variance (ANOVA) and minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) technique. A gradient boost decision tree (GBDT) classifier was trained on the optimal features. Five-fold cross-validated results showed that our model would identify cyclins with an accuracy of 93.06% and AUC value of 0.971, which are higher than the two recent studies on the same data. |
format | Online Article Text |
id | pubmed-8346528 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-83465282021-09-14 Identification of cyclin protein using gradient boost decision tree algorithm Zulfiqar, Hasan Yuan, Shi-Shi Huang, Qin-Lai Sun, Zi-Jie Dao, Fu-Ying Yu, Xiao-Long Lin, Hao Comput Struct Biotechnol J Research Article Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor prediction for sequence similarity-based methods. Thus, it is urgent to construct a machine learning model to identify cyclin proteins. This study aimed to develop a computational model to discriminate cyclin proteins from non-cyclin proteins. In our model, protein sequences were encoded by seven kinds of features that are amino acid composition, composition of k-spaced amino acid pairs, tri peptide composition, pseudo amino acid composition, geary correlation, normalized moreau-broto autocorrelation and composition/transition/distribution. Afterward, these features were optimized by using analysis of variance (ANOVA) and minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) technique. A gradient boost decision tree (GBDT) classifier was trained on the optimal features. Five-fold cross-validated results showed that our model would identify cyclins with an accuracy of 93.06% and AUC value of 0.971, which are higher than the two recent studies on the same data. Research Network of Computational and Structural Biotechnology 2021-07-19 /pmc/articles/PMC8346528/ /pubmed/34527186 http://dx.doi.org/10.1016/j.csbj.2021.07.013 Text en © 2021 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Research Article Zulfiqar, Hasan Yuan, Shi-Shi Huang, Qin-Lai Sun, Zi-Jie Dao, Fu-Ying Yu, Xiao-Long Lin, Hao Identification of cyclin protein using gradient boost decision tree algorithm |
title | Identification of cyclin protein using gradient boost decision tree algorithm |
title_full | Identification of cyclin protein using gradient boost decision tree algorithm |
title_fullStr | Identification of cyclin protein using gradient boost decision tree algorithm |
title_full_unstemmed | Identification of cyclin protein using gradient boost decision tree algorithm |
title_short | Identification of cyclin protein using gradient boost decision tree algorithm |
title_sort | identification of cyclin protein using gradient boost decision tree algorithm |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8346528/ https://www.ncbi.nlm.nih.gov/pubmed/34527186 http://dx.doi.org/10.1016/j.csbj.2021.07.013 |
work_keys_str_mv | AT zulfiqarhasan identificationofcyclinproteinusinggradientboostdecisiontreealgorithm AT yuanshishi identificationofcyclinproteinusinggradientboostdecisiontreealgorithm AT huangqinlai identificationofcyclinproteinusinggradientboostdecisiontreealgorithm AT sunzijie identificationofcyclinproteinusinggradientboostdecisiontreealgorithm AT daofuying identificationofcyclinproteinusinggradientboostdecisiontreealgorithm AT yuxiaolong identificationofcyclinproteinusinggradientboostdecisiontreealgorithm AT linhao identificationofcyclinproteinusinggradientboostdecisiontreealgorithm |