Cargando…

Efficacy of different protein descriptors in predicting protein functional families

BACKGROUND: Sequence-derived structural and physicochemical descriptors have frequently been used in machine learning prediction of protein functional families, thus there is a need to comparatively evaluate the effectiveness of these descriptor-sets by using the same method and parameter optimizati...

Descripción completa

Detalles Bibliográficos
Autores principales: Ong, Serene AK, Lin, Hong Huang, Chen, Yu Zong, Li, Ze Rong, Cao, Zhiwei
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1997217/
https://www.ncbi.nlm.nih.gov/pubmed/17705863
http://dx.doi.org/10.1186/1471-2105-8-300
_version_ 1782135534182727680
author Ong, Serene AK
Lin, Hong Huang
Chen, Yu Zong
Li, Ze Rong
Cao, Zhiwei
author_facet Ong, Serene AK
Lin, Hong Huang
Chen, Yu Zong
Li, Ze Rong
Cao, Zhiwei
author_sort Ong, Serene AK
collection PubMed
description BACKGROUND: Sequence-derived structural and physicochemical descriptors have frequently been used in machine learning prediction of protein functional families, thus there is a need to comparatively evaluate the effectiveness of these descriptor-sets by using the same method and parameter optimization algorithm, and to examine whether the combined use of these descriptor-sets help to improve predictive performance. Six individual descriptor-sets and four combination-sets were evaluated in support vector machines (SVM) prediction of six protein functional families. RESULTS: The performance of these descriptor-sets were ranked by Matthews correlation coefficient (MCC), and categorized into two groups based on their performance. While there is no overwhelmingly favourable choice of descriptor-sets, certain trends were found. The combination-sets tend to give slightly but consistently higher MCC values and thus overall best performance such that three out of four combination-sets show slightly better performance compared to one out of six individual descriptor-sets. CONCLUSION: Our study suggests that currently used descriptor-sets are generally useful for classifying proteins and the prediction performance may be enhanced by exploring combinations of descriptors.
format Text
id pubmed-1997217
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-19972172007-10-02 Efficacy of different protein descriptors in predicting protein functional families Ong, Serene AK Lin, Hong Huang Chen, Yu Zong Li, Ze Rong Cao, Zhiwei BMC Bioinformatics Research Article BACKGROUND: Sequence-derived structural and physicochemical descriptors have frequently been used in machine learning prediction of protein functional families, thus there is a need to comparatively evaluate the effectiveness of these descriptor-sets by using the same method and parameter optimization algorithm, and to examine whether the combined use of these descriptor-sets help to improve predictive performance. Six individual descriptor-sets and four combination-sets were evaluated in support vector machines (SVM) prediction of six protein functional families. RESULTS: The performance of these descriptor-sets were ranked by Matthews correlation coefficient (MCC), and categorized into two groups based on their performance. While there is no overwhelmingly favourable choice of descriptor-sets, certain trends were found. The combination-sets tend to give slightly but consistently higher MCC values and thus overall best performance such that three out of four combination-sets show slightly better performance compared to one out of six individual descriptor-sets. CONCLUSION: Our study suggests that currently used descriptor-sets are generally useful for classifying proteins and the prediction performance may be enhanced by exploring combinations of descriptors. BioMed Central 2007-08-17 /pmc/articles/PMC1997217/ /pubmed/17705863 http://dx.doi.org/10.1186/1471-2105-8-300 Text en Copyright © 2007 Ong et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Ong, Serene AK
Lin, Hong Huang
Chen, Yu Zong
Li, Ze Rong
Cao, Zhiwei
Efficacy of different protein descriptors in predicting protein functional families
title Efficacy of different protein descriptors in predicting protein functional families
title_full Efficacy of different protein descriptors in predicting protein functional families
title_fullStr Efficacy of different protein descriptors in predicting protein functional families
title_full_unstemmed Efficacy of different protein descriptors in predicting protein functional families
title_short Efficacy of different protein descriptors in predicting protein functional families
title_sort efficacy of different protein descriptors in predicting protein functional families
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1997217/
https://www.ncbi.nlm.nih.gov/pubmed/17705863
http://dx.doi.org/10.1186/1471-2105-8-300
work_keys_str_mv AT ongsereneak efficacyofdifferentproteindescriptorsinpredictingproteinfunctionalfamilies
AT linhonghuang efficacyofdifferentproteindescriptorsinpredictingproteinfunctionalfamilies
AT chenyuzong efficacyofdifferentproteindescriptorsinpredictingproteinfunctionalfamilies
AT lizerong efficacyofdifferentproteindescriptorsinpredictingproteinfunctionalfamilies
AT caozhiwei efficacyofdifferentproteindescriptorsinpredictingproteinfunctionalfamilies