Cargando…

Profiles and Majority Voting-Based Ensemble Method for Protein Secondary Structure Prediction

Machine learning techniques have been widely applied to solve the problem of predicting protein secondary structure from the amino acid sequence. They have gained substantial success in this research area. Many methods have been used including k-Nearest Neighbors (k-NNs), Hidden Markov Models (HMMs)...

Descripción completa

Detalles Bibliográficos
Autores principales: Bouziane, Hafida, Messabih, Belhadri, Chouarfia, Abdallah
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3204938/
https://www.ncbi.nlm.nih.gov/pubmed/22058650
http://dx.doi.org/10.4137/EBO.S7931
_version_ 1782215265240481792
author Bouziane, Hafida
Messabih, Belhadri
Chouarfia, Abdallah
author_facet Bouziane, Hafida
Messabih, Belhadri
Chouarfia, Abdallah
author_sort Bouziane, Hafida
collection PubMed
description Machine learning techniques have been widely applied to solve the problem of predicting protein secondary structure from the amino acid sequence. They have gained substantial success in this research area. Many methods have been used including k-Nearest Neighbors (k-NNs), Hidden Markov Models (HMMs), Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs), which have attracted attention recently. Today, the main goal remains to improve the prediction quality of the secondary structure elements. The prediction accuracy has been continuously improved over the years, especially by using hybrid or ensemble methods and incorporating evolutionary information in the form of profiles extracted from alignments of multiple homologous sequences. In this paper, we investigate how best to combine k-NNs, ANNs and Multi-class SVMs (M-SVMs) to improve secondary structure prediction of globular proteins. An ensemble method which combines the outputs of two feed-forward ANNs, k-NN and three M-SVM classifiers has been applied. Ensemble members are combined using two variants of majority voting rule. An heuristic based filter has also been applied to refine the prediction. To investigate how much improvement the general ensemble method can give rather than the individual classifiers that make up the ensemble, we have experimented with the proposed system on the two widely used benchmark datasets RS126 and CB513 using cross-validation tests by including PSI-BLAST position-specific scoring matrix (PSSM) profiles as inputs. The experimental results reveal that the proposed system yields significant performance gains when compared with the best individual classifier.
format Online
Article
Text
id pubmed-3204938
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-32049382011-11-04 Profiles and Majority Voting-Based Ensemble Method for Protein Secondary Structure Prediction Bouziane, Hafida Messabih, Belhadri Chouarfia, Abdallah Evol Bioinform Online Original Research Machine learning techniques have been widely applied to solve the problem of predicting protein secondary structure from the amino acid sequence. They have gained substantial success in this research area. Many methods have been used including k-Nearest Neighbors (k-NNs), Hidden Markov Models (HMMs), Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs), which have attracted attention recently. Today, the main goal remains to improve the prediction quality of the secondary structure elements. The prediction accuracy has been continuously improved over the years, especially by using hybrid or ensemble methods and incorporating evolutionary information in the form of profiles extracted from alignments of multiple homologous sequences. In this paper, we investigate how best to combine k-NNs, ANNs and Multi-class SVMs (M-SVMs) to improve secondary structure prediction of globular proteins. An ensemble method which combines the outputs of two feed-forward ANNs, k-NN and three M-SVM classifiers has been applied. Ensemble members are combined using two variants of majority voting rule. An heuristic based filter has also been applied to refine the prediction. To investigate how much improvement the general ensemble method can give rather than the individual classifiers that make up the ensemble, we have experimented with the proposed system on the two widely used benchmark datasets RS126 and CB513 using cross-validation tests by including PSI-BLAST position-specific scoring matrix (PSSM) profiles as inputs. The experimental results reveal that the proposed system yields significant performance gains when compared with the best individual classifier. Libertas Academica 2011-10-10 /pmc/articles/PMC3204938/ /pubmed/22058650 http://dx.doi.org/10.4137/EBO.S7931 Text en © the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited.
spellingShingle Original Research
Bouziane, Hafida
Messabih, Belhadri
Chouarfia, Abdallah
Profiles and Majority Voting-Based Ensemble Method for Protein Secondary Structure Prediction
title Profiles and Majority Voting-Based Ensemble Method for Protein Secondary Structure Prediction
title_full Profiles and Majority Voting-Based Ensemble Method for Protein Secondary Structure Prediction
title_fullStr Profiles and Majority Voting-Based Ensemble Method for Protein Secondary Structure Prediction
title_full_unstemmed Profiles and Majority Voting-Based Ensemble Method for Protein Secondary Structure Prediction
title_short Profiles and Majority Voting-Based Ensemble Method for Protein Secondary Structure Prediction
title_sort profiles and majority voting-based ensemble method for protein secondary structure prediction
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3204938/
https://www.ncbi.nlm.nih.gov/pubmed/22058650
http://dx.doi.org/10.4137/EBO.S7931
work_keys_str_mv AT bouzianehafida profilesandmajorityvotingbasedensemblemethodforproteinsecondarystructureprediction
AT messabihbelhadri profilesandmajorityvotingbasedensemblemethodforproteinsecondarystructureprediction
AT chouarfiaabdallah profilesandmajorityvotingbasedensemblemethodforproteinsecondarystructureprediction