Cargando…

Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins

Protein–DNA complexes play vital roles in many cellular processes by the interactions of amino acids with DNA. Several computational methods have been developed for predicting the interacting residues in DNA-binding proteins using sequence and/or structural information. These methods showed differen...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nagarajan, R., Ahmad, Shandar, Michael Gromiha, M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2013
Materias:	Computational Biology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3763535/ https://www.ncbi.nlm.nih.gov/pubmed/23788679 http://dx.doi.org/10.1093/nar/gkt544

_version_	1782283029784297472
author	Nagarajan, R. Ahmad, Shandar Michael Gromiha, M.
author_facet	Nagarajan, R. Ahmad, Shandar Michael Gromiha, M.
author_sort	Nagarajan, R.
collection	PubMed
description	Protein–DNA complexes play vital roles in many cellular processes by the interactions of amino acids with DNA. Several computational methods have been developed for predicting the interacting residues in DNA-binding proteins using sequence and/or structural information. These methods showed different levels of accuracies, which may depend on the choice of data sets used in training, the feature sets selected for developing a predictive model, the ability of the models to capture information useful for prediction or a combination of these factors. In many cases, different methods are likely to produce similar results, whereas in others, the predictors may return contradictory predictions. In this situation, a priori estimates of prediction performance applicable to the system being investigated would be helpful for biologists to choose the best method for designing their experiments. In this work, we have constructed unbiased, stringent and diverse data sets for DNA-binding proteins based on various biologically relevant considerations: (i) seven structural classes, (ii) 86 folds, (iii) 106 superfamilies, (iv) 194 families, (v) 15 binding motifs, (vi) single/double-stranded DNA, (vii) DNA conformation (A, B, Z, etc.), (viii) three functions and (ix) disordered regions. These data sets were culled as non-redundant with sequence identities of 25 and 40% and used to evaluate the performance of 11 different methods in which online services or standalone programs are available. We observed that the best performing methods for each of the data sets showed significant biases toward the data sets selected for their benchmark. Our analysis revealed important data set features, which could be used to estimate these context-specific biases and hence suggest the best method to be used for a given problem. We have developed a web server, which considers these features on demand and displays the best method that the investigator should use. The web server is freely available at http://www.biotech.iitm.ac.in/DNA-protein/. Further, we have grouped the methods based on their complexity and analyzed the performance. The information gained in this work could be effectively used to select the best method for designing experiments.
format	Online Article Text
id	pubmed-3763535
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-37635352013-09-10 Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins Nagarajan, R. Ahmad, Shandar Michael Gromiha, M. Nucleic Acids Res Computational Biology Protein–DNA complexes play vital roles in many cellular processes by the interactions of amino acids with DNA. Several computational methods have been developed for predicting the interacting residues in DNA-binding proteins using sequence and/or structural information. These methods showed different levels of accuracies, which may depend on the choice of data sets used in training, the feature sets selected for developing a predictive model, the ability of the models to capture information useful for prediction or a combination of these factors. In many cases, different methods are likely to produce similar results, whereas in others, the predictors may return contradictory predictions. In this situation, a priori estimates of prediction performance applicable to the system being investigated would be helpful for biologists to choose the best method for designing their experiments. In this work, we have constructed unbiased, stringent and diverse data sets for DNA-binding proteins based on various biologically relevant considerations: (i) seven structural classes, (ii) 86 folds, (iii) 106 superfamilies, (iv) 194 families, (v) 15 binding motifs, (vi) single/double-stranded DNA, (vii) DNA conformation (A, B, Z, etc.), (viii) three functions and (ix) disordered regions. These data sets were culled as non-redundant with sequence identities of 25 and 40% and used to evaluate the performance of 11 different methods in which online services or standalone programs are available. We observed that the best performing methods for each of the data sets showed significant biases toward the data sets selected for their benchmark. Our analysis revealed important data set features, which could be used to estimate these context-specific biases and hence suggest the best method to be used for a given problem. We have developed a web server, which considers these features on demand and displays the best method that the investigator should use. The web server is freely available at http://www.biotech.iitm.ac.in/DNA-protein/. Further, we have grouped the methods based on their complexity and analyzed the performance. The information gained in this work could be effectively used to select the best method for designing experiments. Oxford University Press 2013-09 2013-06-20 /pmc/articles/PMC3763535/ /pubmed/23788679 http://dx.doi.org/10.1093/nar/gkt544 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Computational Biology Nagarajan, R. Ahmad, Shandar Michael Gromiha, M. Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins
title	Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins
title_full	Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins
title_fullStr	Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins
title_full_unstemmed	Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins
title_short	Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins
title_sort	novel approach for selecting the best predictor for identifying the binding sites in dna binding proteins
topic	Computational Biology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3763535/ https://www.ncbi.nlm.nih.gov/pubmed/23788679 http://dx.doi.org/10.1093/nar/gkt544
work_keys_str_mv	AT nagarajanr novelapproachforselectingthebestpredictorforidentifyingthebindingsitesindnabindingproteins AT ahmadshandar novelapproachforselectingthebestpredictorforidentifyingthebindingsitesindnabindingproteins AT michaelgromiham novelapproachforselectingthebestpredictorforidentifyingthebindingsitesindnabindingproteins

Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins

Ejemplares similares