Cargando…

DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation

DNA-binding proteins play an important role in most cellular processes. Therefore, it is necessary to develop an efficient predictor for identifying DNA-binding proteins only based on the sequence information of proteins. The bottleneck for constructing a useful predictor is to find suitable feature...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Bin, Wang, Shanyi, Wang, Xiaolong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4611492/
https://www.ncbi.nlm.nih.gov/pubmed/26482832
http://dx.doi.org/10.1038/srep15479
_version_ 1782396079649587200
author Liu, Bin
Wang, Shanyi
Wang, Xiaolong
author_facet Liu, Bin
Wang, Shanyi
Wang, Xiaolong
author_sort Liu, Bin
collection PubMed
description DNA-binding proteins play an important role in most cellular processes. Therefore, it is necessary to develop an efficient predictor for identifying DNA-binding proteins only based on the sequence information of proteins. The bottleneck for constructing a useful predictor is to find suitable features capturing the characteristics of DNA binding proteins. We applied PseAAC to DNA binding protein identification, and PseAAC was further improved by incorporating the evolutionary information by using profile-based protein representation. Finally, Combined with Support Vector Machines (SVMs), a predictor called iDNAPro-PseAAC was proposed. Experimental results on an updated benchmark dataset showed that iDNAPro-PseAAC outperformed some state-of-the-art approaches, and it can achieve stable performance on an independent dataset. By using an ensemble learning approach to incorporate more negative samples (non-DNA binding proteins) in the training process, the performance of iDNAPro-PseAAC was further improved. The web server of iDNAPro-PseAAC is available at http://bioinformatics.hitsz.edu.cn/iDNAPro-PseAAC/.
format Online
Article
Text
id pubmed-4611492
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-46114922015-11-02 DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation Liu, Bin Wang, Shanyi Wang, Xiaolong Sci Rep Article DNA-binding proteins play an important role in most cellular processes. Therefore, it is necessary to develop an efficient predictor for identifying DNA-binding proteins only based on the sequence information of proteins. The bottleneck for constructing a useful predictor is to find suitable features capturing the characteristics of DNA binding proteins. We applied PseAAC to DNA binding protein identification, and PseAAC was further improved by incorporating the evolutionary information by using profile-based protein representation. Finally, Combined with Support Vector Machines (SVMs), a predictor called iDNAPro-PseAAC was proposed. Experimental results on an updated benchmark dataset showed that iDNAPro-PseAAC outperformed some state-of-the-art approaches, and it can achieve stable performance on an independent dataset. By using an ensemble learning approach to incorporate more negative samples (non-DNA binding proteins) in the training process, the performance of iDNAPro-PseAAC was further improved. The web server of iDNAPro-PseAAC is available at http://bioinformatics.hitsz.edu.cn/iDNAPro-PseAAC/. Nature Publishing Group 2015-10-20 /pmc/articles/PMC4611492/ /pubmed/26482832 http://dx.doi.org/10.1038/srep15479 Text en Copyright © 2015, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Liu, Bin
Wang, Shanyi
Wang, Xiaolong
DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation
title DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation
title_full DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation
title_fullStr DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation
title_full_unstemmed DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation
title_short DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation
title_sort dna binding protein identification by combining pseudo amino acid composition and profile-based protein representation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4611492/
https://www.ncbi.nlm.nih.gov/pubmed/26482832
http://dx.doi.org/10.1038/srep15479
work_keys_str_mv AT liubin dnabindingproteinidentificationbycombiningpseudoaminoacidcompositionandprofilebasedproteinrepresentation
AT wangshanyi dnabindingproteinidentificationbycombiningpseudoaminoacidcompositionandprofilebasedproteinrepresentation
AT wangxiaolong dnabindingproteinidentificationbycombiningpseudoaminoacidcompositionandprofilebasedproteinrepresentation