Cargando…
Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable
BACKGROUND: By using a standard Support Vector Machine (SVM) with a Sequential Minimal Optimization (SMO) method of training, Naïve Bayes and other machine learning algorithms we are able to distinguish between two classes of protein sequences: those folding to highly-designable conformations, or th...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2655094/ https://www.ncbi.nlm.nih.gov/pubmed/19014713 http://dx.doi.org/10.1186/1471-2105-9-487 |
_version_ | 1782165437275963392 |
---|---|
author | Peto, Myron Kloczkowski, Andrzej Honavar, Vasant Jernigan, Robert L |
author_facet | Peto, Myron Kloczkowski, Andrzej Honavar, Vasant Jernigan, Robert L |
author_sort | Peto, Myron |
collection | PubMed |
description | BACKGROUND: By using a standard Support Vector Machine (SVM) with a Sequential Minimal Optimization (SMO) method of training, Naïve Bayes and other machine learning algorithms we are able to distinguish between two classes of protein sequences: those folding to highly-designable conformations, or those folding to poorly- or non-designable conformations. RESULTS: First, we generate all possible compact lattice conformations for the specified shape (a hexagon or a triangle) on the 2D triangular lattice. Then we generate all possible binary hydrophobic/polar (H/P) sequences and by using a specified energy function, thread them through all of these compact conformations. If for a given sequence the lowest energy is obtained for a particular lattice conformation we assume that this sequence folds to that conformation. Highly-designable conformations have many H/P sequences folding to them, while poorly-designable conformations have few or no H/P sequences. We classify sequences as folding to either highly – or poorly-designable conformations. We have randomly selected subsets of the sequences belonging to highly-designable and poorly-designable conformations and used them to train several different standard machine learning algorithms. CONCLUSION: By using these machine learning algorithms with ten-fold cross-validation we are able to classify the two classes of sequences with high accuracy – in some cases exceeding 95%. |
format | Text |
id | pubmed-2655094 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-26550942009-03-17 Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable Peto, Myron Kloczkowski, Andrzej Honavar, Vasant Jernigan, Robert L BMC Bioinformatics Research Article BACKGROUND: By using a standard Support Vector Machine (SVM) with a Sequential Minimal Optimization (SMO) method of training, Naïve Bayes and other machine learning algorithms we are able to distinguish between two classes of protein sequences: those folding to highly-designable conformations, or those folding to poorly- or non-designable conformations. RESULTS: First, we generate all possible compact lattice conformations for the specified shape (a hexagon or a triangle) on the 2D triangular lattice. Then we generate all possible binary hydrophobic/polar (H/P) sequences and by using a specified energy function, thread them through all of these compact conformations. If for a given sequence the lowest energy is obtained for a particular lattice conformation we assume that this sequence folds to that conformation. Highly-designable conformations have many H/P sequences folding to them, while poorly-designable conformations have few or no H/P sequences. We classify sequences as folding to either highly – or poorly-designable conformations. We have randomly selected subsets of the sequences belonging to highly-designable and poorly-designable conformations and used them to train several different standard machine learning algorithms. CONCLUSION: By using these machine learning algorithms with ten-fold cross-validation we are able to classify the two classes of sequences with high accuracy – in some cases exceeding 95%. BioMed Central 2008-11-18 /pmc/articles/PMC2655094/ /pubmed/19014713 http://dx.doi.org/10.1186/1471-2105-9-487 Text en Copyright © 2008 Peto et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Peto, Myron Kloczkowski, Andrzej Honavar, Vasant Jernigan, Robert L Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable |
title | Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable |
title_full | Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable |
title_fullStr | Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable |
title_full_unstemmed | Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable |
title_short | Use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable |
title_sort | use of machine learning algorithms to classify binary protein sequences as highly-designable or poorly-designable |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2655094/ https://www.ncbi.nlm.nih.gov/pubmed/19014713 http://dx.doi.org/10.1186/1471-2105-9-487 |
work_keys_str_mv | AT petomyron useofmachinelearningalgorithmstoclassifybinaryproteinsequencesashighlydesignableorpoorlydesignable AT kloczkowskiandrzej useofmachinelearningalgorithmstoclassifybinaryproteinsequencesashighlydesignableorpoorlydesignable AT honavarvasant useofmachinelearningalgorithmstoclassifybinaryproteinsequencesashighlydesignableorpoorlydesignable AT jerniganrobertl useofmachinelearningalgorithmstoclassifybinaryproteinsequencesashighlydesignableorpoorlydesignable |