Cargando…

SLocX: Predicting Subcellular Localization of Arabidopsis Proteins Leveraging Gene Expression Data

Despite the growing volume of experimentally validated knowledge about the subcellular localization of plant proteins, a well performing in silico prediction tool is still a necessity. Existing tools, which employ information derived from protein sequence alone, offer limited accuracy and/or rely on...

Descripción completa

Detalles Bibliográficos
Autores principales: Ryngajllo, Malgorzata, Childs, Liam, Lohse, Marc, Giorgi, Federico M., Lude, Anja, Selbig, Joachim, Usadel, Björn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Research Foundation 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3355584/
https://www.ncbi.nlm.nih.gov/pubmed/22639594
http://dx.doi.org/10.3389/fpls.2011.00043
_version_ 1782233390191214592
author Ryngajllo, Malgorzata
Childs, Liam
Lohse, Marc
Giorgi, Federico M.
Lude, Anja
Selbig, Joachim
Usadel, Björn
author_facet Ryngajllo, Malgorzata
Childs, Liam
Lohse, Marc
Giorgi, Federico M.
Lude, Anja
Selbig, Joachim
Usadel, Björn
author_sort Ryngajllo, Malgorzata
collection PubMed
description Despite the growing volume of experimentally validated knowledge about the subcellular localization of plant proteins, a well performing in silico prediction tool is still a necessity. Existing tools, which employ information derived from protein sequence alone, offer limited accuracy and/or rely on full sequence availability. We explored whether gene expression profiling data can be harnessed to enhance prediction performance. To achieve this, we trained several support vector machines to predict the subcellular localization of Arabidopsis thaliana proteins using sequence derived information, expression behavior, or a combination of these data and compared their predictive performance through a cross-validation test. We show that gene expression carries information about the subcellular localization not available in sequence information, yielding dramatic benefits for plastid localization prediction, and some notable improvements for other compartments such as the mitochondrion, the Golgi, and the plasma membrane. Based on these results, we constructed a novel subcellular localization prediction engine, SLocX, combining gene expression profiling data with protein sequence-based information. We then validated the results of this engine using an independent test set of annotated proteins and a transient expression of GFP fusion proteins. Here, we present the prediction framework and a website of predicted localizations for Arabidopsis. The relatively good accuracy of our prediction engine, even in cases where only partial protein sequence is available (e.g., in sequences lacking the N-terminal region), offers a promising opportunity for similar application to non-sequenced or poorly annotated plant species. Although the prediction scope of our method is currently limited by the availability of expression information on the ATH1 array, we believe that the advances in measuring gene expression technology will make our method applicable for all Arabidopsis proteins.
format Online
Article
Text
id pubmed-3355584
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Frontiers Research Foundation
record_format MEDLINE/PubMed
spelling pubmed-33555842012-05-25 SLocX: Predicting Subcellular Localization of Arabidopsis Proteins Leveraging Gene Expression Data Ryngajllo, Malgorzata Childs, Liam Lohse, Marc Giorgi, Federico M. Lude, Anja Selbig, Joachim Usadel, Björn Front Plant Sci Plant Science Despite the growing volume of experimentally validated knowledge about the subcellular localization of plant proteins, a well performing in silico prediction tool is still a necessity. Existing tools, which employ information derived from protein sequence alone, offer limited accuracy and/or rely on full sequence availability. We explored whether gene expression profiling data can be harnessed to enhance prediction performance. To achieve this, we trained several support vector machines to predict the subcellular localization of Arabidopsis thaliana proteins using sequence derived information, expression behavior, or a combination of these data and compared their predictive performance through a cross-validation test. We show that gene expression carries information about the subcellular localization not available in sequence information, yielding dramatic benefits for plastid localization prediction, and some notable improvements for other compartments such as the mitochondrion, the Golgi, and the plasma membrane. Based on these results, we constructed a novel subcellular localization prediction engine, SLocX, combining gene expression profiling data with protein sequence-based information. We then validated the results of this engine using an independent test set of annotated proteins and a transient expression of GFP fusion proteins. Here, we present the prediction framework and a website of predicted localizations for Arabidopsis. The relatively good accuracy of our prediction engine, even in cases where only partial protein sequence is available (e.g., in sequences lacking the N-terminal region), offers a promising opportunity for similar application to non-sequenced or poorly annotated plant species. Although the prediction scope of our method is currently limited by the availability of expression information on the ATH1 array, we believe that the advances in measuring gene expression technology will make our method applicable for all Arabidopsis proteins. Frontiers Research Foundation 2011-09-12 /pmc/articles/PMC3355584/ /pubmed/22639594 http://dx.doi.org/10.3389/fpls.2011.00043 Text en Copyright © 2011 Ryngajllo, Childs, Lohse, Giorgi, Lude, Selbig and Usadel. http://www.frontiersin.org/licenseagreement This is an open-access article subject to a non-exclusive license between the authors and Frontiers Media SA, which permits use, distribution and reproduction in other forums, provided the original authors and source are credited and other Frontiers conditions are complied with.
spellingShingle Plant Science
Ryngajllo, Malgorzata
Childs, Liam
Lohse, Marc
Giorgi, Federico M.
Lude, Anja
Selbig, Joachim
Usadel, Björn
SLocX: Predicting Subcellular Localization of Arabidopsis Proteins Leveraging Gene Expression Data
title SLocX: Predicting Subcellular Localization of Arabidopsis Proteins Leveraging Gene Expression Data
title_full SLocX: Predicting Subcellular Localization of Arabidopsis Proteins Leveraging Gene Expression Data
title_fullStr SLocX: Predicting Subcellular Localization of Arabidopsis Proteins Leveraging Gene Expression Data
title_full_unstemmed SLocX: Predicting Subcellular Localization of Arabidopsis Proteins Leveraging Gene Expression Data
title_short SLocX: Predicting Subcellular Localization of Arabidopsis Proteins Leveraging Gene Expression Data
title_sort slocx: predicting subcellular localization of arabidopsis proteins leveraging gene expression data
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3355584/
https://www.ncbi.nlm.nih.gov/pubmed/22639594
http://dx.doi.org/10.3389/fpls.2011.00043
work_keys_str_mv AT ryngajllomalgorzata slocxpredictingsubcellularlocalizationofarabidopsisproteinsleveraginggeneexpressiondata
AT childsliam slocxpredictingsubcellularlocalizationofarabidopsisproteinsleveraginggeneexpressiondata
AT lohsemarc slocxpredictingsubcellularlocalizationofarabidopsisproteinsleveraginggeneexpressiondata
AT giorgifedericom slocxpredictingsubcellularlocalizationofarabidopsisproteinsleveraginggeneexpressiondata
AT ludeanja slocxpredictingsubcellularlocalizationofarabidopsisproteinsleveraginggeneexpressiondata
AT selbigjoachim slocxpredictingsubcellularlocalizationofarabidopsisproteinsleveraginggeneexpressiondata
AT usadelbjorn slocxpredictingsubcellularlocalizationofarabidopsisproteinsleveraginggeneexpressiondata