Cargando…

Combining classifiers for improved classification of proteins from sequence or structure

BACKGROUND: Predicting a protein's structural or functional class from its amino acid sequence or structure is a fundamental problem in computational biology. Recently, there has been considerable interest in using discriminative learning algorithms, in particular support vector machines (SVMs)...

Descripción completa

Detalles Bibliográficos
Autores principales: Melvin, Iain, Weston, Jason, Leslie, Christina S, Noble, William S
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2561051/
https://www.ncbi.nlm.nih.gov/pubmed/18808707
http://dx.doi.org/10.1186/1471-2105-9-389
_version_ 1782159707417346048
author Melvin, Iain
Weston, Jason
Leslie, Christina S
Noble, William S
author_facet Melvin, Iain
Weston, Jason
Leslie, Christina S
Noble, William S
author_sort Melvin, Iain
collection PubMed
description BACKGROUND: Predicting a protein's structural or functional class from its amino acid sequence or structure is a fundamental problem in computational biology. Recently, there has been considerable interest in using discriminative learning algorithms, in particular support vector machines (SVMs), for classification of proteins. However, because sufficiently many positive examples are required to train such classifiers, all SVM-based methods are hampered by limited coverage. RESULTS: In this study, we develop a hybrid machine learning approach for classifying proteins, and we apply the method to the problem of assigning proteins to structural categories based on their sequences or their 3D structures. The method combines a full-coverage but lower accuracy nearest neighbor method with higher accuracy but reduced coverage multiclass SVMs to produce a full coverage classifier with overall improved accuracy. The hybrid approach is based on the simple idea of "punting" from one method to another using a learned threshold. CONCLUSION: In cross-validated experiments on the SCOP hierarchy, the hybrid methods consistently outperform the individual component methods at all levels of coverage. Code and data sets are available at
format Text
id pubmed-2561051
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25610512008-10-04 Combining classifiers for improved classification of proteins from sequence or structure Melvin, Iain Weston, Jason Leslie, Christina S Noble, William S BMC Bioinformatics Research Article BACKGROUND: Predicting a protein's structural or functional class from its amino acid sequence or structure is a fundamental problem in computational biology. Recently, there has been considerable interest in using discriminative learning algorithms, in particular support vector machines (SVMs), for classification of proteins. However, because sufficiently many positive examples are required to train such classifiers, all SVM-based methods are hampered by limited coverage. RESULTS: In this study, we develop a hybrid machine learning approach for classifying proteins, and we apply the method to the problem of assigning proteins to structural categories based on their sequences or their 3D structures. The method combines a full-coverage but lower accuracy nearest neighbor method with higher accuracy but reduced coverage multiclass SVMs to produce a full coverage classifier with overall improved accuracy. The hybrid approach is based on the simple idea of "punting" from one method to another using a learned threshold. CONCLUSION: In cross-validated experiments on the SCOP hierarchy, the hybrid methods consistently outperform the individual component methods at all levels of coverage. Code and data sets are available at BioMed Central 2008-09-22 /pmc/articles/PMC2561051/ /pubmed/18808707 http://dx.doi.org/10.1186/1471-2105-9-389 Text en Copyright © 2008 Melvin et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Melvin, Iain
Weston, Jason
Leslie, Christina S
Noble, William S
Combining classifiers for improved classification of proteins from sequence or structure
title Combining classifiers for improved classification of proteins from sequence or structure
title_full Combining classifiers for improved classification of proteins from sequence or structure
title_fullStr Combining classifiers for improved classification of proteins from sequence or structure
title_full_unstemmed Combining classifiers for improved classification of proteins from sequence or structure
title_short Combining classifiers for improved classification of proteins from sequence or structure
title_sort combining classifiers for improved classification of proteins from sequence or structure
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2561051/
https://www.ncbi.nlm.nih.gov/pubmed/18808707
http://dx.doi.org/10.1186/1471-2105-9-389
work_keys_str_mv AT melviniain combiningclassifiersforimprovedclassificationofproteinsfromsequenceorstructure
AT westonjason combiningclassifiersforimprovedclassificationofproteinsfromsequenceorstructure
AT lesliechristinas combiningclassifiersforimprovedclassificationofproteinsfromsequenceorstructure
AT noblewilliams combiningclassifiersforimprovedclassificationofproteinsfromsequenceorstructure