Cargando…

Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks

BACKGROUND: DNA recognition by proteins is one of the most important processes in living systems. Therefore, understanding the recognition process in general, and identifying mutual recognition sites in proteins and DNA in particular, carries great significance. The sequence and structural dependenc...

Descripción completa

Detalles Bibliográficos
Autores principales: Andrabi, Munazah, Mizuguchi, Kenji, Sarai, Akinori, Ahmad, Shandar
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2693520/
https://www.ncbi.nlm.nih.gov/pubmed/19439068
http://dx.doi.org/10.1186/1472-6807-9-30
_version_ 1782167965172498432
author Andrabi, Munazah
Mizuguchi, Kenji
Sarai, Akinori
Ahmad, Shandar
author_facet Andrabi, Munazah
Mizuguchi, Kenji
Sarai, Akinori
Ahmad, Shandar
author_sort Andrabi, Munazah
collection PubMed
description BACKGROUND: DNA recognition by proteins is one of the most important processes in living systems. Therefore, understanding the recognition process in general, and identifying mutual recognition sites in proteins and DNA in particular, carries great significance. The sequence and structural dependence of DNA-binding sites in proteins has led to the development of successful machine learning methods for their prediction. However, all existing machine learning methods predict DNA-binding sites, irrespective of their target sequence and hence, none of them is helpful in identifying specific protein-DNA contacts. In this work, we formulate the problem of predicting specific DNA-binding sites in terms of contacts between the residue environments of proteins and the identity of a mononucleotide or a dinucleotide step in DNA. The aim of this work is to take a protein sequence or structural features as inputs and predict for each amino acid residue if it binds to DNA at locations identified by one of the four possible mononucleotides or one of the 10 unique dinucleotide steps. Contact predictions are made at various levels of resolution viz. in terms of side chain, backbone and major or minor groove atoms of DNA. RESULTS: Significant differences in residue preferences for specific contacts are observed, which combined with other features, lead to promising levels of prediction. In general, PSSM-based predictions, supported by secondary structure and solvent accessibility, achieve a good predictability of ~70–80%, measured by the area under the curve (AUC) of ROC graphs. The major and minor groove contact predictions stood out in terms of their poor predictability from sequences or PSSM, which was very strongly (>20 percentage points) compensated by the addition of secondary structure and solvent accessibility information, revealing a predominant role of local protein structure in the major/minor groove DNA-recognition. Following a detailed analysis of results, a web server to predict mononucleotide and dinucleotide-step contacts using PSSM was developed and made available at or . CONCLUSION: Most residue-nucleotide contacts can be predicted with high accuracy using only sequence and evolutionary information. Major and minor groove contacts, however, depend profoundly on the local structure. Overall, this study takes us a step closer to the ultimate goal of predicting mutual recognition sites in protein and DNA sequences.
format Text
id pubmed-2693520
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26935202009-06-08 Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks Andrabi, Munazah Mizuguchi, Kenji Sarai, Akinori Ahmad, Shandar BMC Struct Biol Research Article BACKGROUND: DNA recognition by proteins is one of the most important processes in living systems. Therefore, understanding the recognition process in general, and identifying mutual recognition sites in proteins and DNA in particular, carries great significance. The sequence and structural dependence of DNA-binding sites in proteins has led to the development of successful machine learning methods for their prediction. However, all existing machine learning methods predict DNA-binding sites, irrespective of their target sequence and hence, none of them is helpful in identifying specific protein-DNA contacts. In this work, we formulate the problem of predicting specific DNA-binding sites in terms of contacts between the residue environments of proteins and the identity of a mononucleotide or a dinucleotide step in DNA. The aim of this work is to take a protein sequence or structural features as inputs and predict for each amino acid residue if it binds to DNA at locations identified by one of the four possible mononucleotides or one of the 10 unique dinucleotide steps. Contact predictions are made at various levels of resolution viz. in terms of side chain, backbone and major or minor groove atoms of DNA. RESULTS: Significant differences in residue preferences for specific contacts are observed, which combined with other features, lead to promising levels of prediction. In general, PSSM-based predictions, supported by secondary structure and solvent accessibility, achieve a good predictability of ~70–80%, measured by the area under the curve (AUC) of ROC graphs. The major and minor groove contact predictions stood out in terms of their poor predictability from sequences or PSSM, which was very strongly (>20 percentage points) compensated by the addition of secondary structure and solvent accessibility information, revealing a predominant role of local protein structure in the major/minor groove DNA-recognition. Following a detailed analysis of results, a web server to predict mononucleotide and dinucleotide-step contacts using PSSM was developed and made available at or . CONCLUSION: Most residue-nucleotide contacts can be predicted with high accuracy using only sequence and evolutionary information. Major and minor groove contacts, however, depend profoundly on the local structure. Overall, this study takes us a step closer to the ultimate goal of predicting mutual recognition sites in protein and DNA sequences. BioMed Central 2009-05-13 /pmc/articles/PMC2693520/ /pubmed/19439068 http://dx.doi.org/10.1186/1472-6807-9-30 Text en Copyright © 2009 Andrabi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Andrabi, Munazah
Mizuguchi, Kenji
Sarai, Akinori
Ahmad, Shandar
Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks
title Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks
title_full Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks
title_fullStr Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks
title_full_unstemmed Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks
title_short Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks
title_sort prediction of mono- and di-nucleotide-specific dna-binding sites in proteins using neural networks
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2693520/
https://www.ncbi.nlm.nih.gov/pubmed/19439068
http://dx.doi.org/10.1186/1472-6807-9-30
work_keys_str_mv AT andrabimunazah predictionofmonoanddinucleotidespecificdnabindingsitesinproteinsusingneuralnetworks
AT mizuguchikenji predictionofmonoanddinucleotidespecificdnabindingsitesinproteinsusingneuralnetworks
AT saraiakinori predictionofmonoanddinucleotidespecificdnabindingsitesinproteinsusingneuralnetworks
AT ahmadshandar predictionofmonoanddinucleotidespecificdnabindingsitesinproteinsusingneuralnetworks