Cargando…

Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains

BACKGROUND: The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellu...

Descripción completa

Detalles Bibliográficos
Autores principales: Bulashevska, Alla, Eils, Roland
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1525000/
https://www.ncbi.nlm.nih.gov/pubmed/16774677
http://dx.doi.org/10.1186/1471-2105-7-298
_version_ 1782128876134072320
author Bulashevska, Alla
Eils, Roland
author_facet Bulashevska, Alla
Eils, Roland
author_sort Bulashevska, Alla
collection PubMed
description BACKGROUND: The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. RESULTS: A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. CONCLUSION: This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request.
format Text
id pubmed-1525000
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15250002006-08-01 Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains Bulashevska, Alla Eils, Roland BMC Bioinformatics Methodology Article BACKGROUND: The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. RESULTS: A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. CONCLUSION: This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request. BioMed Central 2006-06-14 /pmc/articles/PMC1525000/ /pubmed/16774677 http://dx.doi.org/10.1186/1471-2105-7-298 Text en Copyright © 2006 Bulashevska and Eils; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Bulashevska, Alla
Eils, Roland
Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains
title Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains
title_full Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains
title_fullStr Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains
title_full_unstemmed Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains
title_short Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains
title_sort predicting protein subcellular locations using hierarchical ensemble of bayesian classifiers based on markov chains
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1525000/
https://www.ncbi.nlm.nih.gov/pubmed/16774677
http://dx.doi.org/10.1186/1471-2105-7-298
work_keys_str_mv AT bulashevskaalla predictingproteinsubcellularlocationsusinghierarchicalensembleofbayesianclassifiersbasedonmarkovchains
AT eilsroland predictingproteinsubcellularlocationsusinghierarchicalensembleofbayesianclassifiersbasedonmarkovchains