Cargando…

PreDBA: A heterogeneous ensemble approach for predicting protein-DNA binding affinity

The interaction between protein and DNA plays an essential function in various critical natural processes, like DNA replication, transcription, splicing, and repair. Studying the binding affinity of proteins to DNA helps to understand the recognition mechanism of protein-DNA complexes. Since there a...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Wenyi, Deng, Lei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6987227/
https://www.ncbi.nlm.nih.gov/pubmed/31992738
http://dx.doi.org/10.1038/s41598-020-57778-1
_version_ 1783492105418571776
author Yang, Wenyi
Deng, Lei
author_facet Yang, Wenyi
Deng, Lei
author_sort Yang, Wenyi
collection PubMed
description The interaction between protein and DNA plays an essential function in various critical natural processes, like DNA replication, transcription, splicing, and repair. Studying the binding affinity of proteins to DNA helps to understand the recognition mechanism of protein-DNA complexes. Since there are still many limitations on the protein-DNA binding affinity data measured by experiments, accurate and reliable calculation methods are necessarily required. So we put forward a computational approach in this paper, called PreDBA, that can forecast protein-DNA binding affinity effectively by using heterogeneous ensemble models. One hundred protein-DNA complexes are manually collected from the related literature as a data set for protein-DNA binding affinity. Then, 52 sequence and structural features are obtained. Based on this, the correlation between these 52 characteristics and protein-DNA binding affinity is calculated. Furthermore, we found that the protein-DNA binding affinity is affected by the DNA molecule structure of the compound. We classify all protein-DNA compounds into five classifications based on the DNA structure related to the proteins that make up the protein-DNA complexes. In each group, a stacked heterogeneous ensemble model is constructed based on the obtained features. In the end, based on the binding affinity data set, we used the leave-one-out cross-validation to evaluate the proposed method comprehensively. In the five categories, the Pearson correlation coefficient values of our recommended method range from 0.735 to 0.926. We have demonstrated the advantages of the proposed method compared to other machine learning methods and currently existing protein-DNA binding affinity prediction approach.
format Online
Article
Text
id pubmed-6987227
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-69872272020-02-03 PreDBA: A heterogeneous ensemble approach for predicting protein-DNA binding affinity Yang, Wenyi Deng, Lei Sci Rep Article The interaction between protein and DNA plays an essential function in various critical natural processes, like DNA replication, transcription, splicing, and repair. Studying the binding affinity of proteins to DNA helps to understand the recognition mechanism of protein-DNA complexes. Since there are still many limitations on the protein-DNA binding affinity data measured by experiments, accurate and reliable calculation methods are necessarily required. So we put forward a computational approach in this paper, called PreDBA, that can forecast protein-DNA binding affinity effectively by using heterogeneous ensemble models. One hundred protein-DNA complexes are manually collected from the related literature as a data set for protein-DNA binding affinity. Then, 52 sequence and structural features are obtained. Based on this, the correlation between these 52 characteristics and protein-DNA binding affinity is calculated. Furthermore, we found that the protein-DNA binding affinity is affected by the DNA molecule structure of the compound. We classify all protein-DNA compounds into five classifications based on the DNA structure related to the proteins that make up the protein-DNA complexes. In each group, a stacked heterogeneous ensemble model is constructed based on the obtained features. In the end, based on the binding affinity data set, we used the leave-one-out cross-validation to evaluate the proposed method comprehensively. In the five categories, the Pearson correlation coefficient values of our recommended method range from 0.735 to 0.926. We have demonstrated the advantages of the proposed method compared to other machine learning methods and currently existing protein-DNA binding affinity prediction approach. Nature Publishing Group UK 2020-01-28 /pmc/articles/PMC6987227/ /pubmed/31992738 http://dx.doi.org/10.1038/s41598-020-57778-1 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Yang, Wenyi
Deng, Lei
PreDBA: A heterogeneous ensemble approach for predicting protein-DNA binding affinity
title PreDBA: A heterogeneous ensemble approach for predicting protein-DNA binding affinity
title_full PreDBA: A heterogeneous ensemble approach for predicting protein-DNA binding affinity
title_fullStr PreDBA: A heterogeneous ensemble approach for predicting protein-DNA binding affinity
title_full_unstemmed PreDBA: A heterogeneous ensemble approach for predicting protein-DNA binding affinity
title_short PreDBA: A heterogeneous ensemble approach for predicting protein-DNA binding affinity
title_sort predba: a heterogeneous ensemble approach for predicting protein-dna binding affinity
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6987227/
https://www.ncbi.nlm.nih.gov/pubmed/31992738
http://dx.doi.org/10.1038/s41598-020-57778-1
work_keys_str_mv AT yangwenyi predbaaheterogeneousensembleapproachforpredictingproteindnabindingaffinity
AT denglei predbaaheterogeneousensembleapproachforpredictingproteindnabindingaffinity