Cargando…

Protein subcellular localization prediction based on compartment-specific features and structure conservation

BACKGROUND: Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of loca...

Descripción completa

Detalles Bibliográficos
Autores principales: Su, Emily Chia-Yu, Chiu, Hua-Sheng, Lo, Allan, Hwang, Jenn-Kang, Sung, Ting-Yi, Hsu, Wen-Lian
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2040162/
https://www.ncbi.nlm.nih.gov/pubmed/17825110
http://dx.doi.org/10.1186/1471-2105-8-330
_version_ 1782137075684868096
author Su, Emily Chia-Yu
Chiu, Hua-Sheng
Lo, Allan
Hwang, Jenn-Kang
Sung, Ting-Yi
Hsu, Wen-Lian
author_facet Su, Emily Chia-Yu
Chiu, Hua-Sheng
Lo, Allan
Hwang, Jenn-Kang
Sung, Ting-Yi
Hsu, Wen-Lian
author_sort Su, Emily Chia-Yu
collection PubMed
description BACKGROUND: Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction have led to the development of several methods including composition-based and homology-based methods. However, their performance might be significantly degraded if homologous sequences are not detected. Moreover, methods that integrate various features could suffer from the problem of low coverage in high-throughput proteomic analyses due to the lack of information to characterize unknown proteins. RESULTS: We propose a hybrid prediction method for Gram-negative bacteria that combines a one-versus-one support vector machines (SVM) model and a structural homology approach. The SVM model comprises a number of binary classifiers, in which biological features derived from Gram-negative bacteria translocation pathways are incorporated. In the structural homology approach, we employ secondary structure alignment for structural similarity comparison and assign the known localization of the top-ranked protein as the predicted localization of a query protein. The hybrid method achieves overall accuracy of 93.7% and 93.2% using ten-fold cross-validation on the benchmark data sets. In the assessment of the evaluation data sets, our method also attains accurate prediction accuracy of 84.0%, especially when testing on sequences with a low level of homology to the training data. A three-way data split procedure is also incorporated to prevent overestimation of the predictive performance. In addition, we show that the prediction accuracy should be approximately 85% for non-redundant data sets of sequence identity less than 30%. CONCLUSION: Our results demonstrate that biological features derived from Gram-negative bacteria translocation pathways yield a significant improvement. The biological features are interpretable and can be applied in advanced analyses and experimental designs. Moreover, the overall accuracy of combining the structural homology approach is further improved, which suggests that structural conservation could be a useful indicator for inferring localization in addition to sequence homology. The proposed method can be used in large-scale analyses of proteomes.
format Text
id pubmed-2040162
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-20401622007-10-23 Protein subcellular localization prediction based on compartment-specific features and structure conservation Su, Emily Chia-Yu Chiu, Hua-Sheng Lo, Allan Hwang, Jenn-Kang Sung, Ting-Yi Hsu, Wen-Lian BMC Bioinformatics Research Article BACKGROUND: Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction have led to the development of several methods including composition-based and homology-based methods. However, their performance might be significantly degraded if homologous sequences are not detected. Moreover, methods that integrate various features could suffer from the problem of low coverage in high-throughput proteomic analyses due to the lack of information to characterize unknown proteins. RESULTS: We propose a hybrid prediction method for Gram-negative bacteria that combines a one-versus-one support vector machines (SVM) model and a structural homology approach. The SVM model comprises a number of binary classifiers, in which biological features derived from Gram-negative bacteria translocation pathways are incorporated. In the structural homology approach, we employ secondary structure alignment for structural similarity comparison and assign the known localization of the top-ranked protein as the predicted localization of a query protein. The hybrid method achieves overall accuracy of 93.7% and 93.2% using ten-fold cross-validation on the benchmark data sets. In the assessment of the evaluation data sets, our method also attains accurate prediction accuracy of 84.0%, especially when testing on sequences with a low level of homology to the training data. A three-way data split procedure is also incorporated to prevent overestimation of the predictive performance. In addition, we show that the prediction accuracy should be approximately 85% for non-redundant data sets of sequence identity less than 30%. CONCLUSION: Our results demonstrate that biological features derived from Gram-negative bacteria translocation pathways yield a significant improvement. The biological features are interpretable and can be applied in advanced analyses and experimental designs. Moreover, the overall accuracy of combining the structural homology approach is further improved, which suggests that structural conservation could be a useful indicator for inferring localization in addition to sequence homology. The proposed method can be used in large-scale analyses of proteomes. BioMed Central 2007-09-08 /pmc/articles/PMC2040162/ /pubmed/17825110 http://dx.doi.org/10.1186/1471-2105-8-330 Text en Copyright © 2007 Su et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Su, Emily Chia-Yu
Chiu, Hua-Sheng
Lo, Allan
Hwang, Jenn-Kang
Sung, Ting-Yi
Hsu, Wen-Lian
Protein subcellular localization prediction based on compartment-specific features and structure conservation
title Protein subcellular localization prediction based on compartment-specific features and structure conservation
title_full Protein subcellular localization prediction based on compartment-specific features and structure conservation
title_fullStr Protein subcellular localization prediction based on compartment-specific features and structure conservation
title_full_unstemmed Protein subcellular localization prediction based on compartment-specific features and structure conservation
title_short Protein subcellular localization prediction based on compartment-specific features and structure conservation
title_sort protein subcellular localization prediction based on compartment-specific features and structure conservation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2040162/
https://www.ncbi.nlm.nih.gov/pubmed/17825110
http://dx.doi.org/10.1186/1471-2105-8-330
work_keys_str_mv AT suemilychiayu proteinsubcellularlocalizationpredictionbasedoncompartmentspecificfeaturesandstructureconservation
AT chiuhuasheng proteinsubcellularlocalizationpredictionbasedoncompartmentspecificfeaturesandstructureconservation
AT loallan proteinsubcellularlocalizationpredictionbasedoncompartmentspecificfeaturesandstructureconservation
AT hwangjennkang proteinsubcellularlocalizationpredictionbasedoncompartmentspecificfeaturesandstructureconservation
AT sungtingyi proteinsubcellularlocalizationpredictionbasedoncompartmentspecificfeaturesandstructureconservation
AT hsuwenlian proteinsubcellularlocalizationpredictionbasedoncompartmentspecificfeaturesandstructureconservation