Cargando…

iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model

DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in the field of genome annotation. Although many efforts have been made in this regard, further effort i...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Wei-Zhong, Fang, Jian-An, Xiao, Xuan, Chou, Kuo-Chen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3174210/
https://www.ncbi.nlm.nih.gov/pubmed/21935457
http://dx.doi.org/10.1371/journal.pone.0024756
_version_ 1782212048589946880
author Lin, Wei-Zhong
Fang, Jian-An
Xiao, Xuan
Chou, Kuo-Chen
author_facet Lin, Wei-Zhong
Fang, Jian-An
Xiao, Xuan
Chou, Kuo-Chen
author_sort Lin, Wei-Zhong
collection PubMed
description DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in the field of genome annotation. Although many efforts have been made in this regard, further effort is needed to enhance the prediction power. By incorporating the features into the general form of pseudo amino acid composition that were extracted from protein sequences via the “grey model” and by adopting the random forest operation engine, we proposed a new predictor, called iDNA-Prot, for identifying uncharacterized proteins as DNA-binding proteins or non-DNA binding proteins based on their amino acid sequences information alone. The overall success rate by iDNA-Prot was 83.96% that was obtained via jackknife tests on a newly constructed stringent benchmark dataset in which none of the proteins included has [Image: see text] pairwise sequence identity to any other in a same subset. In addition to achieving high success rate, the computational time for iDNA-Prot is remarkably shorter in comparison with the relevant existing predictors. Hence it is anticipated that iDNA-Prot may become a useful high throughput tool for large-scale analysis of DNA-binding proteins. As a user-friendly web-server, iDNA-Prot is freely accessible to the public at the web-site on http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results.
format Online
Article
Text
id pubmed-3174210
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-31742102011-09-20 iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model Lin, Wei-Zhong Fang, Jian-An Xiao, Xuan Chou, Kuo-Chen PLoS One Research Article DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in the field of genome annotation. Although many efforts have been made in this regard, further effort is needed to enhance the prediction power. By incorporating the features into the general form of pseudo amino acid composition that were extracted from protein sequences via the “grey model” and by adopting the random forest operation engine, we proposed a new predictor, called iDNA-Prot, for identifying uncharacterized proteins as DNA-binding proteins or non-DNA binding proteins based on their amino acid sequences information alone. The overall success rate by iDNA-Prot was 83.96% that was obtained via jackknife tests on a newly constructed stringent benchmark dataset in which none of the proteins included has [Image: see text] pairwise sequence identity to any other in a same subset. In addition to achieving high success rate, the computational time for iDNA-Prot is remarkably shorter in comparison with the relevant existing predictors. Hence it is anticipated that iDNA-Prot may become a useful high throughput tool for large-scale analysis of DNA-binding proteins. As a user-friendly web-server, iDNA-Prot is freely accessible to the public at the web-site on http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results. Public Library of Science 2011-09-15 /pmc/articles/PMC3174210/ /pubmed/21935457 http://dx.doi.org/10.1371/journal.pone.0024756 Text en Lin et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Lin, Wei-Zhong
Fang, Jian-An
Xiao, Xuan
Chou, Kuo-Chen
iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model
title iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model
title_full iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model
title_fullStr iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model
title_full_unstemmed iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model
title_short iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model
title_sort idna-prot: identification of dna binding proteins using random forest with grey model
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3174210/
https://www.ncbi.nlm.nih.gov/pubmed/21935457
http://dx.doi.org/10.1371/journal.pone.0024756
work_keys_str_mv AT linweizhong idnaprotidentificationofdnabindingproteinsusingrandomforestwithgreymodel
AT fangjianan idnaprotidentificationofdnabindingproteinsusingrandomforestwithgreymodel
AT xiaoxuan idnaprotidentificationofdnabindingproteinsusingrandomforestwithgreymodel
AT choukuochen idnaprotidentificationofdnabindingproteinsusingrandomforestwithgreymodel