Cargando…

Plus ça change – evolutionary sequence divergence predicts protein subcellular localization signals

BACKGROUND: Protein subcellular localization is a central problem in understanding cell biology and has been the focus of intense research. In order to predict localization from amino acid sequence a myriad of features have been tried: including amino acid composition, sequence similarity, the prese...

Descripción completa

Detalles Bibliográficos
Autores principales: Fukasawa, Yoshinori, Leung, Ross KK, Tsui, Stephen KW, Horton, Paul
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3906766/
https://www.ncbi.nlm.nih.gov/pubmed/24438075
http://dx.doi.org/10.1186/1471-2164-15-46
_version_ 1782301519296593920
author Fukasawa, Yoshinori
Leung, Ross KK
Tsui, Stephen KW
Horton, Paul
author_facet Fukasawa, Yoshinori
Leung, Ross KK
Tsui, Stephen KW
Horton, Paul
author_sort Fukasawa, Yoshinori
collection PubMed
description BACKGROUND: Protein subcellular localization is a central problem in understanding cell biology and has been the focus of intense research. In order to predict localization from amino acid sequence a myriad of features have been tried: including amino acid composition, sequence similarity, the presence of certain motifs or domains, and many others. Surprisingly, sequence conservation of sorting motifs has not yet been employed, despite its extensive use for tasks such as the prediction of transcription factor binding sites. RESULTS: Here, we flip the problem around, and present a proof of concept for the idea that the lack of sequence conservation can be a novel feature for localization prediction. We show that for yeast, mammal and plant datasets, evolutionary sequence divergence alone has significant power to identify sequences with N-terminal sorting sequences. Moreover sequence divergence is nearly as effective when computed on automatically defined ortholog sets as on hand curated ones. Unfortunately, sequence divergence did not necessarily increase classification performance when combined with some traditional sequence features such as amino acid composition. However a post-hoc analysis of the proteins in which sequence divergence changes the prediction yielded some proteins with atypical (i.e. not MPP-cleaved) matrix targeting signals as well as a few misannotations. CONCLUSION: We report the results of the first quantitative study of the effectiveness of evolutionary sequence divergence as a feature for protein subcellular localization prediction. We show that divergence is indeed useful for prediction, but it is not trivial to improve overall accuracy simply by adding this feature to classical sequence features. Nevertheless we argue that sequence divergence is a promising feature and show anecdotal examples in which it succeeds where other features fail.
format Online
Article
Text
id pubmed-3906766
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39067662014-02-12 Plus ça change – evolutionary sequence divergence predicts protein subcellular localization signals Fukasawa, Yoshinori Leung, Ross KK Tsui, Stephen KW Horton, Paul BMC Genomics Research Article BACKGROUND: Protein subcellular localization is a central problem in understanding cell biology and has been the focus of intense research. In order to predict localization from amino acid sequence a myriad of features have been tried: including amino acid composition, sequence similarity, the presence of certain motifs or domains, and many others. Surprisingly, sequence conservation of sorting motifs has not yet been employed, despite its extensive use for tasks such as the prediction of transcription factor binding sites. RESULTS: Here, we flip the problem around, and present a proof of concept for the idea that the lack of sequence conservation can be a novel feature for localization prediction. We show that for yeast, mammal and plant datasets, evolutionary sequence divergence alone has significant power to identify sequences with N-terminal sorting sequences. Moreover sequence divergence is nearly as effective when computed on automatically defined ortholog sets as on hand curated ones. Unfortunately, sequence divergence did not necessarily increase classification performance when combined with some traditional sequence features such as amino acid composition. However a post-hoc analysis of the proteins in which sequence divergence changes the prediction yielded some proteins with atypical (i.e. not MPP-cleaved) matrix targeting signals as well as a few misannotations. CONCLUSION: We report the results of the first quantitative study of the effectiveness of evolutionary sequence divergence as a feature for protein subcellular localization prediction. We show that divergence is indeed useful for prediction, but it is not trivial to improve overall accuracy simply by adding this feature to classical sequence features. Nevertheless we argue that sequence divergence is a promising feature and show anecdotal examples in which it succeeds where other features fail. BioMed Central 2014-01-20 /pmc/articles/PMC3906766/ /pubmed/24438075 http://dx.doi.org/10.1186/1471-2164-15-46 Text en Copyright © 2014 Fukasawa et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Fukasawa, Yoshinori
Leung, Ross KK
Tsui, Stephen KW
Horton, Paul
Plus ça change – evolutionary sequence divergence predicts protein subcellular localization signals
title Plus ça change – evolutionary sequence divergence predicts protein subcellular localization signals
title_full Plus ça change – evolutionary sequence divergence predicts protein subcellular localization signals
title_fullStr Plus ça change – evolutionary sequence divergence predicts protein subcellular localization signals
title_full_unstemmed Plus ça change – evolutionary sequence divergence predicts protein subcellular localization signals
title_short Plus ça change – evolutionary sequence divergence predicts protein subcellular localization signals
title_sort plus ça change – evolutionary sequence divergence predicts protein subcellular localization signals
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3906766/
https://www.ncbi.nlm.nih.gov/pubmed/24438075
http://dx.doi.org/10.1186/1471-2164-15-46
work_keys_str_mv AT fukasawayoshinori pluscachangeevolutionarysequencedivergencepredictsproteinsubcellularlocalizationsignals
AT leungrosskk pluscachangeevolutionarysequencedivergencepredictsproteinsubcellularlocalizationsignals
AT tsuistephenkw pluscachangeevolutionarysequencedivergencepredictsproteinsubcellularlocalizationsignals
AT hortonpaul pluscachangeevolutionarysequencedivergencepredictsproteinsubcellularlocalizationsignals