Cargando…
BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN
Predicting the polyproline type II (PPII) helix structure is crucial important in many research areas, such as the protein folding mechanisms, the drug targets, and the protein functions. However, many existing PPII helix prediction algorithms encode the protein sequence information in a single way,...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9433275/ https://www.ncbi.nlm.nih.gov/pubmed/36060139 http://dx.doi.org/10.1155/2022/9015123 |
_version_ | 1784780591432990720 |
---|---|
author | Feng, Chuang Wang, Zhen Li, Guokun Yang, Xiaohan Wu, Nannan Wang, Lei |
author_facet | Feng, Chuang Wang, Zhen Li, Guokun Yang, Xiaohan Wu, Nannan Wang, Lei |
author_sort | Feng, Chuang |
collection | PubMed |
description | Predicting the polyproline type II (PPII) helix structure is crucial important in many research areas, such as the protein folding mechanisms, the drug targets, and the protein functions. However, many existing PPII helix prediction algorithms encode the protein sequence information in a single way, which causes the insufficient learning of protein sequence feature information. To improve the protein sequence encoding performance, this paper proposes a BERT-based PPII helix structure prediction algorithm (BERT-PPII), which learns the protein sequence information based on the BERT model. The BERT model's CLS vector can fairly fuse sample's each amino acid residue information. Thus, we utilize the CLS vector as the global feature to represent the sample's global contextual information. As the interactions among the protein chains' local amino acid residues have an important influence on the formation of PPII helix, we utilize the CNN to extract local amino acid residues' features which can further enhance the information expression of protein sequence samples. In this paper, we fuse the CLS vectors with CNN local features to improve the performance of predicting PPII structure. Compared to the state-of-the-art PPIIPRED method, the experimental results on the unbalanced dataset show that the proposed method improves the accuracy value by 1% on the strict dataset and 2% on the less strict dataset. Correspondingly, the results on the balanced dataset show that the AUCs of the proposed method are 0.826 on the strict dataset and 0.785 on less strict datasets, respectively. For the independent test set, the proposed method has the AUC value of 0.827 on the strict dataset and 0.783 on the less strict dataset. The above experimental results have proved that the proposed BERT-PPII method can achieve a superior performance of predicting the PPII helix. |
format | Online Article Text |
id | pubmed-9433275 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-94332752022-09-01 BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN Feng, Chuang Wang, Zhen Li, Guokun Yang, Xiaohan Wu, Nannan Wang, Lei Biomed Res Int Research Article Predicting the polyproline type II (PPII) helix structure is crucial important in many research areas, such as the protein folding mechanisms, the drug targets, and the protein functions. However, many existing PPII helix prediction algorithms encode the protein sequence information in a single way, which causes the insufficient learning of protein sequence feature information. To improve the protein sequence encoding performance, this paper proposes a BERT-based PPII helix structure prediction algorithm (BERT-PPII), which learns the protein sequence information based on the BERT model. The BERT model's CLS vector can fairly fuse sample's each amino acid residue information. Thus, we utilize the CLS vector as the global feature to represent the sample's global contextual information. As the interactions among the protein chains' local amino acid residues have an important influence on the formation of PPII helix, we utilize the CNN to extract local amino acid residues' features which can further enhance the information expression of protein sequence samples. In this paper, we fuse the CLS vectors with CNN local features to improve the performance of predicting PPII structure. Compared to the state-of-the-art PPIIPRED method, the experimental results on the unbalanced dataset show that the proposed method improves the accuracy value by 1% on the strict dataset and 2% on the less strict dataset. Correspondingly, the results on the balanced dataset show that the AUCs of the proposed method are 0.826 on the strict dataset and 0.785 on less strict datasets, respectively. For the independent test set, the proposed method has the AUC value of 0.827 on the strict dataset and 0.783 on the less strict dataset. The above experimental results have proved that the proposed BERT-PPII method can achieve a superior performance of predicting the PPII helix. Hindawi 2022-08-24 /pmc/articles/PMC9433275/ /pubmed/36060139 http://dx.doi.org/10.1155/2022/9015123 Text en Copyright © 2022 Chuang Feng et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Feng, Chuang Wang, Zhen Li, Guokun Yang, Xiaohan Wu, Nannan Wang, Lei BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN |
title | BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN |
title_full | BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN |
title_fullStr | BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN |
title_full_unstemmed | BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN |
title_short | BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN |
title_sort | bert-ppii: the polyproline type ii helix structure prediction model based on bert and multichannel cnn |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9433275/ https://www.ncbi.nlm.nih.gov/pubmed/36060139 http://dx.doi.org/10.1155/2022/9015123 |
work_keys_str_mv | AT fengchuang bertppiithepolyprolinetypeiihelixstructurepredictionmodelbasedonbertandmultichannelcnn AT wangzhen bertppiithepolyprolinetypeiihelixstructurepredictionmodelbasedonbertandmultichannelcnn AT liguokun bertppiithepolyprolinetypeiihelixstructurepredictionmodelbasedonbertandmultichannelcnn AT yangxiaohan bertppiithepolyprolinetypeiihelixstructurepredictionmodelbasedonbertandmultichannelcnn AT wunannan bertppiithepolyprolinetypeiihelixstructurepredictionmodelbasedonbertandmultichannelcnn AT wanglei bertppiithepolyprolinetypeiihelixstructurepredictionmodelbasedonbertandmultichannelcnn |