Cargando…

BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN

Predicting the polyproline type II (PPII) helix structure is crucial important in many research areas, such as the protein folding mechanisms, the drug targets, and the protein functions. However, many existing PPII helix prediction algorithms encode the protein sequence information in a single way,...

Descripción completa

Detalles Bibliográficos
Autores principales: Feng, Chuang, Wang, Zhen, Li, Guokun, Yang, Xiaohan, Wu, Nannan, Wang, Lei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9433275/
https://www.ncbi.nlm.nih.gov/pubmed/36060139
http://dx.doi.org/10.1155/2022/9015123
_version_ 1784780591432990720
author Feng, Chuang
Wang, Zhen
Li, Guokun
Yang, Xiaohan
Wu, Nannan
Wang, Lei
author_facet Feng, Chuang
Wang, Zhen
Li, Guokun
Yang, Xiaohan
Wu, Nannan
Wang, Lei
author_sort Feng, Chuang
collection PubMed
description Predicting the polyproline type II (PPII) helix structure is crucial important in many research areas, such as the protein folding mechanisms, the drug targets, and the protein functions. However, many existing PPII helix prediction algorithms encode the protein sequence information in a single way, which causes the insufficient learning of protein sequence feature information. To improve the protein sequence encoding performance, this paper proposes a BERT-based PPII helix structure prediction algorithm (BERT-PPII), which learns the protein sequence information based on the BERT model. The BERT model's CLS vector can fairly fuse sample's each amino acid residue information. Thus, we utilize the CLS vector as the global feature to represent the sample's global contextual information. As the interactions among the protein chains' local amino acid residues have an important influence on the formation of PPII helix, we utilize the CNN to extract local amino acid residues' features which can further enhance the information expression of protein sequence samples. In this paper, we fuse the CLS vectors with CNN local features to improve the performance of predicting PPII structure. Compared to the state-of-the-art PPIIPRED method, the experimental results on the unbalanced dataset show that the proposed method improves the accuracy value by 1% on the strict dataset and 2% on the less strict dataset. Correspondingly, the results on the balanced dataset show that the AUCs of the proposed method are 0.826 on the strict dataset and 0.785 on less strict datasets, respectively. For the independent test set, the proposed method has the AUC value of 0.827 on the strict dataset and 0.783 on the less strict dataset. The above experimental results have proved that the proposed BERT-PPII method can achieve a superior performance of predicting the PPII helix.
format Online
Article
Text
id pubmed-9433275
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-94332752022-09-01 BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN Feng, Chuang Wang, Zhen Li, Guokun Yang, Xiaohan Wu, Nannan Wang, Lei Biomed Res Int Research Article Predicting the polyproline type II (PPII) helix structure is crucial important in many research areas, such as the protein folding mechanisms, the drug targets, and the protein functions. However, many existing PPII helix prediction algorithms encode the protein sequence information in a single way, which causes the insufficient learning of protein sequence feature information. To improve the protein sequence encoding performance, this paper proposes a BERT-based PPII helix structure prediction algorithm (BERT-PPII), which learns the protein sequence information based on the BERT model. The BERT model's CLS vector can fairly fuse sample's each amino acid residue information. Thus, we utilize the CLS vector as the global feature to represent the sample's global contextual information. As the interactions among the protein chains' local amino acid residues have an important influence on the formation of PPII helix, we utilize the CNN to extract local amino acid residues' features which can further enhance the information expression of protein sequence samples. In this paper, we fuse the CLS vectors with CNN local features to improve the performance of predicting PPII structure. Compared to the state-of-the-art PPIIPRED method, the experimental results on the unbalanced dataset show that the proposed method improves the accuracy value by 1% on the strict dataset and 2% on the less strict dataset. Correspondingly, the results on the balanced dataset show that the AUCs of the proposed method are 0.826 on the strict dataset and 0.785 on less strict datasets, respectively. For the independent test set, the proposed method has the AUC value of 0.827 on the strict dataset and 0.783 on the less strict dataset. The above experimental results have proved that the proposed BERT-PPII method can achieve a superior performance of predicting the PPII helix. Hindawi 2022-08-24 /pmc/articles/PMC9433275/ /pubmed/36060139 http://dx.doi.org/10.1155/2022/9015123 Text en Copyright © 2022 Chuang Feng et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Feng, Chuang
Wang, Zhen
Li, Guokun
Yang, Xiaohan
Wu, Nannan
Wang, Lei
BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN
title BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN
title_full BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN
title_fullStr BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN
title_full_unstemmed BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN
title_short BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN
title_sort bert-ppii: the polyproline type ii helix structure prediction model based on bert and multichannel cnn
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9433275/
https://www.ncbi.nlm.nih.gov/pubmed/36060139
http://dx.doi.org/10.1155/2022/9015123
work_keys_str_mv AT fengchuang bertppiithepolyprolinetypeiihelixstructurepredictionmodelbasedonbertandmultichannelcnn
AT wangzhen bertppiithepolyprolinetypeiihelixstructurepredictionmodelbasedonbertandmultichannelcnn
AT liguokun bertppiithepolyprolinetypeiihelixstructurepredictionmodelbasedonbertandmultichannelcnn
AT yangxiaohan bertppiithepolyprolinetypeiihelixstructurepredictionmodelbasedonbertandmultichannelcnn
AT wunannan bertppiithepolyprolinetypeiihelixstructurepredictionmodelbasedonbertandmultichannelcnn
AT wanglei bertppiithepolyprolinetypeiihelixstructurepredictionmodelbasedonbertandmultichannelcnn