Cargando…

Genome-wide prediction of minor-groove electrostatic potential enables biophysical modeling of protein–DNA binding

Protein–DNA binding is a fundamental component of gene regulatory processes, but it is still not completely understood how proteins recognize their target sites in the genome. Besides hydrogen bonding in the major groove (base readout), proteins recognize minor-groove geometry using positively charg...

Descripción completa

Detalles Bibliográficos
Autores principales: Chiu, Tsu-Pei, Rao, Satyanarayan, Mann, Richard S., Honig, Barry, Rohs, Remo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5716191/
https://www.ncbi.nlm.nih.gov/pubmed/29040720
http://dx.doi.org/10.1093/nar/gkx915
Descripción
Sumario:Protein–DNA binding is a fundamental component of gene regulatory processes, but it is still not completely understood how proteins recognize their target sites in the genome. Besides hydrogen bonding in the major groove (base readout), proteins recognize minor-groove geometry using positively charged amino acids (shape readout). The underlying mechanism of DNA shape readout involves the correlation between minor-groove width and electrostatic potential (EP). To probe this biophysical effect directly, rather than using minor-groove width as an indirect measure for shape readout, we developed a methodology, DNAphi, for predicting EP in the minor groove and confirmed the direct role of EP in protein–DNA binding using massive sequencing data. The DNAphi method uses a sliding-window approach to mine results from non-linear Poisson–Boltzmann (NLPB) calculations on DNA structures derived from all-atom Monte Carlo simulations. We validated this approach, which only requires nucleotide sequence as input, based on direct comparison with NLPB calculations for available crystal structures. Using statistical machine-learning approaches, we showed that adding EP as a biophysical feature can improve the predictive power of quantitative binding specificity models across 27 transcription factor families. High-throughput prediction of EP offers a novel way to integrate biophysical and genomic studies of protein–DNA binding.