Cargando…

Multistage Combination Classifier Augmented Model for Protein Secondary Structure Prediction

In the field of bioinformatics, understanding protein secondary structure is very important for exploring diseases and finding new treatments. Considering that the physical experiment-based protein secondary structure prediction methods are time-consuming and expensive, some pattern recognition and...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Xu, Liu, Yiwei, Wang, Yaming, Zhang, Liang, Feng, Lin, Jin, Bo, Zhang, Hongzhe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9170271/
https://www.ncbi.nlm.nih.gov/pubmed/35677562
http://dx.doi.org/10.3389/fgene.2022.769828
_version_ 1784721384110292992
author Zhang, Xu
Liu, Yiwei
Wang, Yaming
Zhang, Liang
Feng, Lin
Jin, Bo
Zhang, Hongzhe
author_facet Zhang, Xu
Liu, Yiwei
Wang, Yaming
Zhang, Liang
Feng, Lin
Jin, Bo
Zhang, Hongzhe
author_sort Zhang, Xu
collection PubMed
description In the field of bioinformatics, understanding protein secondary structure is very important for exploring diseases and finding new treatments. Considering that the physical experiment-based protein secondary structure prediction methods are time-consuming and expensive, some pattern recognition and machine learning methods are proposed. However, most of the methods achieve quite similar performance, which seems to reach a model capacity bottleneck. As both model design and learning process can affect the model learning capacity, we pay attention to the latter part. To this end, a framework called Multistage Combination Classifier Augmented Model (MCCM) is proposed to solve the protein secondary structure prediction task. Specifically, first, a feature extraction module is introduced to extract features with different levels of learning difficulties. Second, multistage combination classifiers are proposed to learn decision boundaries for easy and hard samples, respectively, with the latter penalizing the loss value of the hard samples and finally improving the prediction performance of hard samples. Third, based on the Dirichlet distribution and information entropy measurement, a sample difficulty discrimination module is designed to assign samples with different learning difficulty levels to the aforementioned classifiers. The experimental results on the publicly available benchmark CB513 dataset show that our method outperforms most state-of-the-art models.
format Online
Article
Text
id pubmed-9170271
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-91702712022-06-07 Multistage Combination Classifier Augmented Model for Protein Secondary Structure Prediction Zhang, Xu Liu, Yiwei Wang, Yaming Zhang, Liang Feng, Lin Jin, Bo Zhang, Hongzhe Front Genet Genetics In the field of bioinformatics, understanding protein secondary structure is very important for exploring diseases and finding new treatments. Considering that the physical experiment-based protein secondary structure prediction methods are time-consuming and expensive, some pattern recognition and machine learning methods are proposed. However, most of the methods achieve quite similar performance, which seems to reach a model capacity bottleneck. As both model design and learning process can affect the model learning capacity, we pay attention to the latter part. To this end, a framework called Multistage Combination Classifier Augmented Model (MCCM) is proposed to solve the protein secondary structure prediction task. Specifically, first, a feature extraction module is introduced to extract features with different levels of learning difficulties. Second, multistage combination classifiers are proposed to learn decision boundaries for easy and hard samples, respectively, with the latter penalizing the loss value of the hard samples and finally improving the prediction performance of hard samples. Third, based on the Dirichlet distribution and information entropy measurement, a sample difficulty discrimination module is designed to assign samples with different learning difficulty levels to the aforementioned classifiers. The experimental results on the publicly available benchmark CB513 dataset show that our method outperforms most state-of-the-art models. Frontiers Media S.A. 2022-05-23 /pmc/articles/PMC9170271/ /pubmed/35677562 http://dx.doi.org/10.3389/fgene.2022.769828 Text en Copyright © 2022 Zhang, Liu, Wang, Zhang, Feng, Jin and Zhang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Zhang, Xu
Liu, Yiwei
Wang, Yaming
Zhang, Liang
Feng, Lin
Jin, Bo
Zhang, Hongzhe
Multistage Combination Classifier Augmented Model for Protein Secondary Structure Prediction
title Multistage Combination Classifier Augmented Model for Protein Secondary Structure Prediction
title_full Multistage Combination Classifier Augmented Model for Protein Secondary Structure Prediction
title_fullStr Multistage Combination Classifier Augmented Model for Protein Secondary Structure Prediction
title_full_unstemmed Multistage Combination Classifier Augmented Model for Protein Secondary Structure Prediction
title_short Multistage Combination Classifier Augmented Model for Protein Secondary Structure Prediction
title_sort multistage combination classifier augmented model for protein secondary structure prediction
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9170271/
https://www.ncbi.nlm.nih.gov/pubmed/35677562
http://dx.doi.org/10.3389/fgene.2022.769828
work_keys_str_mv AT zhangxu multistagecombinationclassifieraugmentedmodelforproteinsecondarystructureprediction
AT liuyiwei multistagecombinationclassifieraugmentedmodelforproteinsecondarystructureprediction
AT wangyaming multistagecombinationclassifieraugmentedmodelforproteinsecondarystructureprediction
AT zhangliang multistagecombinationclassifieraugmentedmodelforproteinsecondarystructureprediction
AT fenglin multistagecombinationclassifieraugmentedmodelforproteinsecondarystructureprediction
AT jinbo multistagecombinationclassifieraugmentedmodelforproteinsecondarystructureprediction
AT zhanghongzhe multistagecombinationclassifieraugmentedmodelforproteinsecondarystructureprediction