Cargando…

Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition

BACKGROUND: Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming more...

Descripción completa

Detalles Bibliográficos
Autores principales: Tamura, Takeyuki, Akutsu, Tatsuya
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2220007/
https://www.ncbi.nlm.nih.gov/pubmed/18047679
http://dx.doi.org/10.1186/1471-2105-8-466
_version_ 1782149326452031488
author Tamura, Takeyuki
Akutsu, Tatsuya
author_facet Tamura, Takeyuki
Akutsu, Tatsuya
author_sort Tamura, Takeyuki
collection PubMed
description BACKGROUND: Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming more important since information on subcellular location is helpful for annotation of proteins and genes and the number of complete genomes is rapidly increasing. Since existing predictors are based on various heuristics, it is important to develop a simple method with high prediction accuracies. RESULTS: In this paper, we propose a novel and general predicting method by combining techniques for sequence alignment and feature vectors based on amino acid composition. We implemented this method with support vector machines on plant data sets extracted from the TargetP database. Through fivefold cross validation tests, the obtained overall accuracies and average MCC were 0.9096 and 0.8655 respectively. We also applied our method to other datasets including that of WoLF PSORT. CONCLUSION: Although there is a predictor which uses the information of gene ontology and yields higher accuracy than ours, our accuracies are higher than existing predictors which use only sequence information. Since such information as gene ontology can be obtained only for known proteins, our predictor is considered to be useful for subcellular location prediction of newly-discovered proteins. Furthermore, the idea of combination of alignment and amino acid frequency is novel and general so that it may be applied to other problems in bioinformatics. Our method for plant is also implemented as a web-system and available on .
format Text
id pubmed-2220007
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22200072008-01-31 Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition Tamura, Takeyuki Akutsu, Tatsuya BMC Bioinformatics Research Article BACKGROUND: Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming more important since information on subcellular location is helpful for annotation of proteins and genes and the number of complete genomes is rapidly increasing. Since existing predictors are based on various heuristics, it is important to develop a simple method with high prediction accuracies. RESULTS: In this paper, we propose a novel and general predicting method by combining techniques for sequence alignment and feature vectors based on amino acid composition. We implemented this method with support vector machines on plant data sets extracted from the TargetP database. Through fivefold cross validation tests, the obtained overall accuracies and average MCC were 0.9096 and 0.8655 respectively. We also applied our method to other datasets including that of WoLF PSORT. CONCLUSION: Although there is a predictor which uses the information of gene ontology and yields higher accuracy than ours, our accuracies are higher than existing predictors which use only sequence information. Since such information as gene ontology can be obtained only for known proteins, our predictor is considered to be useful for subcellular location prediction of newly-discovered proteins. Furthermore, the idea of combination of alignment and amino acid frequency is novel and general so that it may be applied to other problems in bioinformatics. Our method for plant is also implemented as a web-system and available on . BioMed Central 2007-11-30 /pmc/articles/PMC2220007/ /pubmed/18047679 http://dx.doi.org/10.1186/1471-2105-8-466 Text en Copyright © 2007 Tamura and Akutsu; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Tamura, Takeyuki
Akutsu, Tatsuya
Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition
title Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition
title_full Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition
title_fullStr Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition
title_full_unstemmed Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition
title_short Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition
title_sort subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2220007/
https://www.ncbi.nlm.nih.gov/pubmed/18047679
http://dx.doi.org/10.1186/1471-2105-8-466
work_keys_str_mv AT tamuratakeyuki subcellularlocationpredictionofproteinsusingsupportvectormachineswithalignmentofblocksequencesutilizingaminoacidcomposition
AT akutsutatsuya subcellularlocationpredictionofproteinsusingsupportvectormachineswithalignmentofblocksequencesutilizingaminoacidcomposition