Cargando…

BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes

BACKGROUND: Accurately predicting the binding affinities of large sets of protein-ligand complexes is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identi...

Descripción completa

Detalles Bibliográficos
Autores principales: Ashtawy, Hossam M, Mahapatra, Nihar R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4347622/
https://www.ncbi.nlm.nih.gov/pubmed/25734685
http://dx.doi.org/10.1186/1471-2105-16-S4-S8
_version_ 1782359850625269760
author Ashtawy, Hossam M
Mahapatra, Nihar R
author_facet Ashtawy, Hossam M
Mahapatra, Nihar R
author_sort Ashtawy, Hossam M
collection PubMed
description BACKGROUND: Accurately predicting the binding affinities of large sets of protein-ligand complexes is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identify drug leads, the fidelity with which it predicts the affinity of a ligand candidate for a protein's binding site has a significant bearing on the accuracy of virtual screening. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited predictive power has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we present novel SFs employing a large ensemble of neural networks (NN) in conjunction with a diverse set of physicochemical and geometrical features characterizing protein-ligand complexes to predict binding affinity. RESULTS: We assess the scoring accuracies of two new ensemble NN SFs based on bagging (BgN-Score) and boosting (BsN-Score), as well as those of conventional SFs in the context of the 2007 PDBbind benchmark that encompasses a diverse set of high-quality protein families. We find that BgN-Score and BsN-Score have more than 25% better Pearson's correlation coefficient (0.804 and 0.816 vs. 0.644) between predicted and measured binding affinities compared to that achieved by a state-of-the-art conventional SF. In addition, these ensemble NN SFs are also at least 19% more accurate (0.804 and 0.816 vs. 0.675) than SFs based on a single neural network that has been traditionally used in drug discovery applications. We further find that ensemble models based on NNs surpass SFs based on the decision-tree ensemble technique Random Forests. CONCLUSIONS: Ensemble neural networks SFs, BgN-Score and BsN-Score, are the most accurate in predicting binding affinity of protein-ligand complexes among the considered SFs. Moreover, their accuracies are even higher when they are used to predict binding affinities of protein-ligand complexes that are related to their training sets.
format Online
Article
Text
id pubmed-4347622
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43476222015-03-19 BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes Ashtawy, Hossam M Mahapatra, Nihar R BMC Bioinformatics Research BACKGROUND: Accurately predicting the binding affinities of large sets of protein-ligand complexes is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identify drug leads, the fidelity with which it predicts the affinity of a ligand candidate for a protein's binding site has a significant bearing on the accuracy of virtual screening. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited predictive power has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we present novel SFs employing a large ensemble of neural networks (NN) in conjunction with a diverse set of physicochemical and geometrical features characterizing protein-ligand complexes to predict binding affinity. RESULTS: We assess the scoring accuracies of two new ensemble NN SFs based on bagging (BgN-Score) and boosting (BsN-Score), as well as those of conventional SFs in the context of the 2007 PDBbind benchmark that encompasses a diverse set of high-quality protein families. We find that BgN-Score and BsN-Score have more than 25% better Pearson's correlation coefficient (0.804 and 0.816 vs. 0.644) between predicted and measured binding affinities compared to that achieved by a state-of-the-art conventional SF. In addition, these ensemble NN SFs are also at least 19% more accurate (0.804 and 0.816 vs. 0.675) than SFs based on a single neural network that has been traditionally used in drug discovery applications. We further find that ensemble models based on NNs surpass SFs based on the decision-tree ensemble technique Random Forests. CONCLUSIONS: Ensemble neural networks SFs, BgN-Score and BsN-Score, are the most accurate in predicting binding affinity of protein-ligand complexes among the considered SFs. Moreover, their accuracies are even higher when they are used to predict binding affinities of protein-ligand complexes that are related to their training sets. BioMed Central 2015-02-23 /pmc/articles/PMC4347622/ /pubmed/25734685 http://dx.doi.org/10.1186/1471-2105-16-S4-S8 Text en Copyright © 2015 Ashtawy and Mahapatra; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Ashtawy, Hossam M
Mahapatra, Nihar R
BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes
title BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes
title_full BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes
title_fullStr BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes
title_full_unstemmed BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes
title_short BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes
title_sort bgn-score and bsn-score: bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4347622/
https://www.ncbi.nlm.nih.gov/pubmed/25734685
http://dx.doi.org/10.1186/1471-2105-16-S4-S8
work_keys_str_mv AT ashtawyhossamm bgnscoreandbsnscorebaggingandboostingbasedensembleneuralnetworksscoringfunctionsforaccuratebindingaffinitypredictionofproteinligandcomplexes
AT mahapatraniharr bgnscoreandbsnscorebaggingandboostingbasedensembleneuralnetworksscoringfunctionsforaccuratebindingaffinitypredictionofproteinligandcomplexes