Cargando…

DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool

There are currently 151 plants with draft genomes available but levels of functional annotation for putative protein products are low. Therefore, accurate computational predictions are essential to annotate genomes in the first instance, and to provide focus for the more costly and time consuming fu...

Descripción completa

Detalles Bibliográficos
Autores principales: Motion, Graham B., Howden, Andrew J. M., Huitema, Edgar, Jones, Susan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4678848/
https://www.ncbi.nlm.nih.gov/pubmed/26304539
http://dx.doi.org/10.1093/nar/gkv805
_version_ 1782405517496287232
author Motion, Graham B.
Howden, Andrew J. M.
Huitema, Edgar
Jones, Susan
author_facet Motion, Graham B.
Howden, Andrew J. M.
Huitema, Edgar
Jones, Susan
author_sort Motion, Graham B.
collection PubMed
description There are currently 151 plants with draft genomes available but levels of functional annotation for putative protein products are low. Therefore, accurate computational predictions are essential to annotate genomes in the first instance, and to provide focus for the more costly and time consuming functional assays that follow. DNA-binding proteins are an important class of proteins that require annotation, but current computational methods are not applicable for genome wide predictions in plant species. Here, we explore the use of species and lineage specific models for the prediction of DNA-binding proteins in plants. We show that a species specific support vector machine model based on Arabidopsis sequence data is more accurate (accuracy 81%) than a generic model (74%), and based on this we develop a plant specific model for predicting DNA-binding proteins. We apply this model to the tomato proteome and demonstrate its ability to perform accurate high-throughput prediction of DNA-binding proteins. In doing so, we have annotated 36 currently uncharacterised proteins by assigning a putative DNA-binding function. Our model is publically available and we propose it be used in combination with existing tools to help increase annotation levels of DNA-binding proteins encoded in plant genomes.
format Online
Article
Text
id pubmed-4678848
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-46788482015-12-16 DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool Motion, Graham B. Howden, Andrew J. M. Huitema, Edgar Jones, Susan Nucleic Acids Res Methods Online There are currently 151 plants with draft genomes available but levels of functional annotation for putative protein products are low. Therefore, accurate computational predictions are essential to annotate genomes in the first instance, and to provide focus for the more costly and time consuming functional assays that follow. DNA-binding proteins are an important class of proteins that require annotation, but current computational methods are not applicable for genome wide predictions in plant species. Here, we explore the use of species and lineage specific models for the prediction of DNA-binding proteins in plants. We show that a species specific support vector machine model based on Arabidopsis sequence data is more accurate (accuracy 81%) than a generic model (74%), and based on this we develop a plant specific model for predicting DNA-binding proteins. We apply this model to the tomato proteome and demonstrate its ability to perform accurate high-throughput prediction of DNA-binding proteins. In doing so, we have annotated 36 currently uncharacterised proteins by assigning a putative DNA-binding function. Our model is publically available and we propose it be used in combination with existing tools to help increase annotation levels of DNA-binding proteins encoded in plant genomes. Oxford University Press 2015-12-15 2015-08-24 /pmc/articles/PMC4678848/ /pubmed/26304539 http://dx.doi.org/10.1093/nar/gkv805 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Motion, Graham B.
Howden, Andrew J. M.
Huitema, Edgar
Jones, Susan
DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool
title DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool
title_full DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool
title_fullStr DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool
title_full_unstemmed DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool
title_short DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool
title_sort dna-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4678848/
https://www.ncbi.nlm.nih.gov/pubmed/26304539
http://dx.doi.org/10.1093/nar/gkv805
work_keys_str_mv AT motiongrahamb dnabindingproteinpredictionusingplantspecificsupportvectormachinesvalidationandapplicationofanewgenomeannotationtool
AT howdenandrewjm dnabindingproteinpredictionusingplantspecificsupportvectormachinesvalidationandapplicationofanewgenomeannotationtool
AT huitemaedgar dnabindingproteinpredictionusingplantspecificsupportvectormachinesvalidationandapplicationofanewgenomeannotationtool
AT jonessusan dnabindingproteinpredictionusingplantspecificsupportvectormachinesvalidationandapplicationofanewgenomeannotationtool