Cargando…
DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool
There are currently 151 plants with draft genomes available but levels of functional annotation for putative protein products are low. Therefore, accurate computational predictions are essential to annotate genomes in the first instance, and to provide focus for the more costly and time consuming fu...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4678848/ https://www.ncbi.nlm.nih.gov/pubmed/26304539 http://dx.doi.org/10.1093/nar/gkv805 |
_version_ | 1782405517496287232 |
---|---|
author | Motion, Graham B. Howden, Andrew J. M. Huitema, Edgar Jones, Susan |
author_facet | Motion, Graham B. Howden, Andrew J. M. Huitema, Edgar Jones, Susan |
author_sort | Motion, Graham B. |
collection | PubMed |
description | There are currently 151 plants with draft genomes available but levels of functional annotation for putative protein products are low. Therefore, accurate computational predictions are essential to annotate genomes in the first instance, and to provide focus for the more costly and time consuming functional assays that follow. DNA-binding proteins are an important class of proteins that require annotation, but current computational methods are not applicable for genome wide predictions in plant species. Here, we explore the use of species and lineage specific models for the prediction of DNA-binding proteins in plants. We show that a species specific support vector machine model based on Arabidopsis sequence data is more accurate (accuracy 81%) than a generic model (74%), and based on this we develop a plant specific model for predicting DNA-binding proteins. We apply this model to the tomato proteome and demonstrate its ability to perform accurate high-throughput prediction of DNA-binding proteins. In doing so, we have annotated 36 currently uncharacterised proteins by assigning a putative DNA-binding function. Our model is publically available and we propose it be used in combination with existing tools to help increase annotation levels of DNA-binding proteins encoded in plant genomes. |
format | Online Article Text |
id | pubmed-4678848 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-46788482015-12-16 DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool Motion, Graham B. Howden, Andrew J. M. Huitema, Edgar Jones, Susan Nucleic Acids Res Methods Online There are currently 151 plants with draft genomes available but levels of functional annotation for putative protein products are low. Therefore, accurate computational predictions are essential to annotate genomes in the first instance, and to provide focus for the more costly and time consuming functional assays that follow. DNA-binding proteins are an important class of proteins that require annotation, but current computational methods are not applicable for genome wide predictions in plant species. Here, we explore the use of species and lineage specific models for the prediction of DNA-binding proteins in plants. We show that a species specific support vector machine model based on Arabidopsis sequence data is more accurate (accuracy 81%) than a generic model (74%), and based on this we develop a plant specific model for predicting DNA-binding proteins. We apply this model to the tomato proteome and demonstrate its ability to perform accurate high-throughput prediction of DNA-binding proteins. In doing so, we have annotated 36 currently uncharacterised proteins by assigning a putative DNA-binding function. Our model is publically available and we propose it be used in combination with existing tools to help increase annotation levels of DNA-binding proteins encoded in plant genomes. Oxford University Press 2015-12-15 2015-08-24 /pmc/articles/PMC4678848/ /pubmed/26304539 http://dx.doi.org/10.1093/nar/gkv805 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Motion, Graham B. Howden, Andrew J. M. Huitema, Edgar Jones, Susan DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool |
title | DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool |
title_full | DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool |
title_fullStr | DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool |
title_full_unstemmed | DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool |
title_short | DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool |
title_sort | dna-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4678848/ https://www.ncbi.nlm.nih.gov/pubmed/26304539 http://dx.doi.org/10.1093/nar/gkv805 |
work_keys_str_mv | AT motiongrahamb dnabindingproteinpredictionusingplantspecificsupportvectormachinesvalidationandapplicationofanewgenomeannotationtool AT howdenandrewjm dnabindingproteinpredictionusingplantspecificsupportvectormachinesvalidationandapplicationofanewgenomeannotationtool AT huitemaedgar dnabindingproteinpredictionusingplantspecificsupportvectormachinesvalidationandapplicationofanewgenomeannotationtool AT jonessusan dnabindingproteinpredictionusingplantspecificsupportvectormachinesvalidationandapplicationofanewgenomeannotationtool |