Cargando…

Building multiclass classifiers for remote homology detection and fold recognition

BACKGROUND: Protein remote homology detection and fold recognition are central problems in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for solving these problems. These methods are primarily used to solve bina...

Descripción completa

Detalles Bibliográficos
Autores principales: Rangwala, Huzefa, Karypis, George
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635067/
https://www.ncbi.nlm.nih.gov/pubmed/17042943
http://dx.doi.org/10.1186/1471-2105-7-455
_version_ 1782130667807571968
author Rangwala, Huzefa
Karypis, George
author_facet Rangwala, Huzefa
Karypis, George
author_sort Rangwala, Huzefa
collection PubMed
description BACKGROUND: Protein remote homology detection and fold recognition are central problems in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for solving these problems. These methods are primarily used to solve binary classification problems and they have not been extensively used to solve the more general multiclass remote homology prediction and fold recognition problems. RESULTS: We present a comprehensive evaluation of a number of methods for building SVM-based multiclass classification schemes in the context of the SCOP protein classification. These methods include schemes that directly build an SVM-based multiclass model, schemes that employ a second-level learning approach to combine the predictions generated by a set of binary SVM-based classifiers, and schemes that build and combine binary classifiers for various levels of the SCOP hierarchy beyond those defining the target classes. CONCLUSION: Analyzing the performance achieved by the different approaches on four different datasets we show that most of the proposed multiclass SVM-based classification approaches are quite effective in solving the remote homology prediction and fold recognition problems and that the schemes that use predictions from binary models constructed for ancestral categories within the SCOP hierarchy tend to not only lead to lower error rates but also reduce the number of errors in which a superfamily is assigned to an entirely different fold and a fold is predicted as being from a different SCOP class. Our results also show that the limited size of the training data makes it hard to learn complex second-level models, and that models of moderate complexity lead to consistently better results.
format Text
id pubmed-1635067
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-16350672006-11-08 Building multiclass classifiers for remote homology detection and fold recognition Rangwala, Huzefa Karypis, George BMC Bioinformatics Research Article BACKGROUND: Protein remote homology detection and fold recognition are central problems in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for solving these problems. These methods are primarily used to solve binary classification problems and they have not been extensively used to solve the more general multiclass remote homology prediction and fold recognition problems. RESULTS: We present a comprehensive evaluation of a number of methods for building SVM-based multiclass classification schemes in the context of the SCOP protein classification. These methods include schemes that directly build an SVM-based multiclass model, schemes that employ a second-level learning approach to combine the predictions generated by a set of binary SVM-based classifiers, and schemes that build and combine binary classifiers for various levels of the SCOP hierarchy beyond those defining the target classes. CONCLUSION: Analyzing the performance achieved by the different approaches on four different datasets we show that most of the proposed multiclass SVM-based classification approaches are quite effective in solving the remote homology prediction and fold recognition problems and that the schemes that use predictions from binary models constructed for ancestral categories within the SCOP hierarchy tend to not only lead to lower error rates but also reduce the number of errors in which a superfamily is assigned to an entirely different fold and a fold is predicted as being from a different SCOP class. Our results also show that the limited size of the training data makes it hard to learn complex second-level models, and that models of moderate complexity lead to consistently better results. BioMed Central 2006-10-16 /pmc/articles/PMC1635067/ /pubmed/17042943 http://dx.doi.org/10.1186/1471-2105-7-455 Text en Copyright © 2006 Rangwala and Karypis; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Rangwala, Huzefa
Karypis, George
Building multiclass classifiers for remote homology detection and fold recognition
title Building multiclass classifiers for remote homology detection and fold recognition
title_full Building multiclass classifiers for remote homology detection and fold recognition
title_fullStr Building multiclass classifiers for remote homology detection and fold recognition
title_full_unstemmed Building multiclass classifiers for remote homology detection and fold recognition
title_short Building multiclass classifiers for remote homology detection and fold recognition
title_sort building multiclass classifiers for remote homology detection and fold recognition
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635067/
https://www.ncbi.nlm.nih.gov/pubmed/17042943
http://dx.doi.org/10.1186/1471-2105-7-455
work_keys_str_mv AT rangwalahuzefa buildingmulticlassclassifiersforremotehomologydetectionandfoldrecognition
AT karypisgeorge buildingmulticlassclassifiersforremotehomologydetectionandfoldrecognition