Cargando…

Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines

BACKGROUND: Predicting the subcellular localization of proteins is important for determining the function of proteins. Previous works focused on predicting protein localization in Gram-negative bacteria obtained good results. However, these methods had relatively low accuracies for the localization...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Jiren, Sung, Wing-Kin, Krishnan, Arun, Li, Kuo-Bin
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1190155/
https://www.ncbi.nlm.nih.gov/pubmed/16011808
http://dx.doi.org/10.1186/1471-2105-6-174
_version_ 1782124793147949056
author Wang, Jiren
Sung, Wing-Kin
Krishnan, Arun
Li, Kuo-Bin
author_facet Wang, Jiren
Sung, Wing-Kin
Krishnan, Arun
Li, Kuo-Bin
author_sort Wang, Jiren
collection PubMed
description BACKGROUND: Predicting the subcellular localization of proteins is important for determining the function of proteins. Previous works focused on predicting protein localization in Gram-negative bacteria obtained good results. However, these methods had relatively low accuracies for the localization of extracellular proteins. This paper studies ways to improve the accuracy for predicting extracellular localization in Gram-negative bacteria. RESULTS: We have developed a system for predicting the subcellular localization of proteins for Gram-negative bacteria based on amino acid subalphabets and a combination of multiple support vector machines. The recall of the extracellular site and overall recall of our predictor reach 86.0% and 89.8%, respectively, in 5-fold cross-validation. To the best of our knowledge, these are the most accurate results for predicting subcellular localization in Gram-negative bacteria. CONCLUSION: Clustering 20 amino acids into a few groups by the proposed greedy algorithm provides a new way to extract features from protein sequences to cover more adjacent amino acids and hence reduce the dimensionality of the input vector of protein features. It was observed that a good amino acid grouping leads to an increase in prediction performance. Furthermore, a proper choice of a subset of complementary support vector machines constructed by different features of proteins maximizes the prediction accuracy.
format Text
id pubmed-1190155
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-11901552005-08-25 Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines Wang, Jiren Sung, Wing-Kin Krishnan, Arun Li, Kuo-Bin BMC Bioinformatics Research Article BACKGROUND: Predicting the subcellular localization of proteins is important for determining the function of proteins. Previous works focused on predicting protein localization in Gram-negative bacteria obtained good results. However, these methods had relatively low accuracies for the localization of extracellular proteins. This paper studies ways to improve the accuracy for predicting extracellular localization in Gram-negative bacteria. RESULTS: We have developed a system for predicting the subcellular localization of proteins for Gram-negative bacteria based on amino acid subalphabets and a combination of multiple support vector machines. The recall of the extracellular site and overall recall of our predictor reach 86.0% and 89.8%, respectively, in 5-fold cross-validation. To the best of our knowledge, these are the most accurate results for predicting subcellular localization in Gram-negative bacteria. CONCLUSION: Clustering 20 amino acids into a few groups by the proposed greedy algorithm provides a new way to extract features from protein sequences to cover more adjacent amino acids and hence reduce the dimensionality of the input vector of protein features. It was observed that a good amino acid grouping leads to an increase in prediction performance. Furthermore, a proper choice of a subset of complementary support vector machines constructed by different features of proteins maximizes the prediction accuracy. BioMed Central 2005-07-13 /pmc/articles/PMC1190155/ /pubmed/16011808 http://dx.doi.org/10.1186/1471-2105-6-174 Text en Copyright © 2005 Wang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Wang, Jiren
Sung, Wing-Kin
Krishnan, Arun
Li, Kuo-Bin
Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines
title Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines
title_full Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines
title_fullStr Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines
title_full_unstemmed Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines
title_short Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines
title_sort protein subcellular localization prediction for gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1190155/
https://www.ncbi.nlm.nih.gov/pubmed/16011808
http://dx.doi.org/10.1186/1471-2105-6-174
work_keys_str_mv AT wangjiren proteinsubcellularlocalizationpredictionforgramnegativebacteriausingaminoacidsubalphabetsandacombinationofmultiplesupportvectormachines
AT sungwingkin proteinsubcellularlocalizationpredictionforgramnegativebacteriausingaminoacidsubalphabetsandacombinationofmultiplesupportvectormachines
AT krishnanarun proteinsubcellularlocalizationpredictionforgramnegativebacteriausingaminoacidsubalphabetsandacombinationofmultiplesupportvectormachines
AT likuobin proteinsubcellularlocalizationpredictionforgramnegativebacteriausingaminoacidsubalphabetsandacombinationofmultiplesupportvectormachines