Cargando…

Identification of DNA-binding proteins using support vector machines and evolutionary profiles

BACKGROUND: Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation, as these proteins play a crucial role in gene-regulation. In this paper, we developed various SVM modules for predicting DNA-binding domains and proteins. All models were trained and...

Descripción completa

Detalles Bibliográficos
Autores principales: Kumar, Manish, Gromiha, Michael M, Raghava, Gajendra PS
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2216048/
https://www.ncbi.nlm.nih.gov/pubmed/18042272
http://dx.doi.org/10.1186/1471-2105-8-463
_version_ 1782149100335005696
author Kumar, Manish
Gromiha, Michael M
Raghava, Gajendra PS
author_facet Kumar, Manish
Gromiha, Michael M
Raghava, Gajendra PS
author_sort Kumar, Manish
collection PubMed
description BACKGROUND: Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation, as these proteins play a crucial role in gene-regulation. In this paper, we developed various SVM modules for predicting DNA-binding domains and proteins. All models were trained and tested on multiple datasets of non-redundant proteins. RESULTS: SVM models have been developed on DNAaset, which consists of 1153 DNA-binding and equal number of non DNA-binding proteins, and achieved the maximum accuracy of 72.42% and 71.59% using amino acid and dipeptide compositions, respectively. The performance of SVM model improved from 72.42% to 74.22%, when evolutionary information in form of PSSM profiles was used as input instead of amino acid composition. In addition, SVM models have been developed on DNAset, which consists of 146 DNA-binding and 250 non-binding chains/domains, and achieved the maximum accuracy of 79.80% and 86.62% using amino acid composition and PSSM profiles. The SVM models developed in this study perform better than existing methods on a blind dataset. CONCLUSION: A highly accurate method has been developed for predicting DNA-binding proteins using SVM and PSSM profiles. This is the first study in which evolutionary information in form of PSSM profiles has been used successfully for predicting DNA-binding proteins. A web-server DNAbinder has been developed for identifying DNA-binding proteins and domains from query amino acid sequences .
format Text
id pubmed-2216048
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22160482008-01-29 Identification of DNA-binding proteins using support vector machines and evolutionary profiles Kumar, Manish Gromiha, Michael M Raghava, Gajendra PS BMC Bioinformatics Research Article BACKGROUND: Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation, as these proteins play a crucial role in gene-regulation. In this paper, we developed various SVM modules for predicting DNA-binding domains and proteins. All models were trained and tested on multiple datasets of non-redundant proteins. RESULTS: SVM models have been developed on DNAaset, which consists of 1153 DNA-binding and equal number of non DNA-binding proteins, and achieved the maximum accuracy of 72.42% and 71.59% using amino acid and dipeptide compositions, respectively. The performance of SVM model improved from 72.42% to 74.22%, when evolutionary information in form of PSSM profiles was used as input instead of amino acid composition. In addition, SVM models have been developed on DNAset, which consists of 146 DNA-binding and 250 non-binding chains/domains, and achieved the maximum accuracy of 79.80% and 86.62% using amino acid composition and PSSM profiles. The SVM models developed in this study perform better than existing methods on a blind dataset. CONCLUSION: A highly accurate method has been developed for predicting DNA-binding proteins using SVM and PSSM profiles. This is the first study in which evolutionary information in form of PSSM profiles has been used successfully for predicting DNA-binding proteins. A web-server DNAbinder has been developed for identifying DNA-binding proteins and domains from query amino acid sequences . BioMed Central 2007-11-27 /pmc/articles/PMC2216048/ /pubmed/18042272 http://dx.doi.org/10.1186/1471-2105-8-463 Text en Copyright © 2007 Kumar et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kumar, Manish
Gromiha, Michael M
Raghava, Gajendra PS
Identification of DNA-binding proteins using support vector machines and evolutionary profiles
title Identification of DNA-binding proteins using support vector machines and evolutionary profiles
title_full Identification of DNA-binding proteins using support vector machines and evolutionary profiles
title_fullStr Identification of DNA-binding proteins using support vector machines and evolutionary profiles
title_full_unstemmed Identification of DNA-binding proteins using support vector machines and evolutionary profiles
title_short Identification of DNA-binding proteins using support vector machines and evolutionary profiles
title_sort identification of dna-binding proteins using support vector machines and evolutionary profiles
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2216048/
https://www.ncbi.nlm.nih.gov/pubmed/18042272
http://dx.doi.org/10.1186/1471-2105-8-463
work_keys_str_mv AT kumarmanish identificationofdnabindingproteinsusingsupportvectormachinesandevolutionaryprofiles
AT gromihamichaelm identificationofdnabindingproteinsusingsupportvectormachinesandevolutionaryprofiles
AT raghavagajendraps identificationofdnabindingproteinsusingsupportvectormachinesandevolutionaryprofiles