Cargando…
BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes
BACKGROUND: Accurately predicting the binding affinities of large sets of protein-ligand complexes is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identi...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4347622/ https://www.ncbi.nlm.nih.gov/pubmed/25734685 http://dx.doi.org/10.1186/1471-2105-16-S4-S8 |
_version_ | 1782359850625269760 |
---|---|
author | Ashtawy, Hossam M Mahapatra, Nihar R |
author_facet | Ashtawy, Hossam M Mahapatra, Nihar R |
author_sort | Ashtawy, Hossam M |
collection | PubMed |
description | BACKGROUND: Accurately predicting the binding affinities of large sets of protein-ligand complexes is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identify drug leads, the fidelity with which it predicts the affinity of a ligand candidate for a protein's binding site has a significant bearing on the accuracy of virtual screening. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited predictive power has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we present novel SFs employing a large ensemble of neural networks (NN) in conjunction with a diverse set of physicochemical and geometrical features characterizing protein-ligand complexes to predict binding affinity. RESULTS: We assess the scoring accuracies of two new ensemble NN SFs based on bagging (BgN-Score) and boosting (BsN-Score), as well as those of conventional SFs in the context of the 2007 PDBbind benchmark that encompasses a diverse set of high-quality protein families. We find that BgN-Score and BsN-Score have more than 25% better Pearson's correlation coefficient (0.804 and 0.816 vs. 0.644) between predicted and measured binding affinities compared to that achieved by a state-of-the-art conventional SF. In addition, these ensemble NN SFs are also at least 19% more accurate (0.804 and 0.816 vs. 0.675) than SFs based on a single neural network that has been traditionally used in drug discovery applications. We further find that ensemble models based on NNs surpass SFs based on the decision-tree ensemble technique Random Forests. CONCLUSIONS: Ensemble neural networks SFs, BgN-Score and BsN-Score, are the most accurate in predicting binding affinity of protein-ligand complexes among the considered SFs. Moreover, their accuracies are even higher when they are used to predict binding affinities of protein-ligand complexes that are related to their training sets. |
format | Online Article Text |
id | pubmed-4347622 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43476222015-03-19 BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes Ashtawy, Hossam M Mahapatra, Nihar R BMC Bioinformatics Research BACKGROUND: Accurately predicting the binding affinities of large sets of protein-ligand complexes is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identify drug leads, the fidelity with which it predicts the affinity of a ligand candidate for a protein's binding site has a significant bearing on the accuracy of virtual screening. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited predictive power has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we present novel SFs employing a large ensemble of neural networks (NN) in conjunction with a diverse set of physicochemical and geometrical features characterizing protein-ligand complexes to predict binding affinity. RESULTS: We assess the scoring accuracies of two new ensemble NN SFs based on bagging (BgN-Score) and boosting (BsN-Score), as well as those of conventional SFs in the context of the 2007 PDBbind benchmark that encompasses a diverse set of high-quality protein families. We find that BgN-Score and BsN-Score have more than 25% better Pearson's correlation coefficient (0.804 and 0.816 vs. 0.644) between predicted and measured binding affinities compared to that achieved by a state-of-the-art conventional SF. In addition, these ensemble NN SFs are also at least 19% more accurate (0.804 and 0.816 vs. 0.675) than SFs based on a single neural network that has been traditionally used in drug discovery applications. We further find that ensemble models based on NNs surpass SFs based on the decision-tree ensemble technique Random Forests. CONCLUSIONS: Ensemble neural networks SFs, BgN-Score and BsN-Score, are the most accurate in predicting binding affinity of protein-ligand complexes among the considered SFs. Moreover, their accuracies are even higher when they are used to predict binding affinities of protein-ligand complexes that are related to their training sets. BioMed Central 2015-02-23 /pmc/articles/PMC4347622/ /pubmed/25734685 http://dx.doi.org/10.1186/1471-2105-16-S4-S8 Text en Copyright © 2015 Ashtawy and Mahapatra; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Ashtawy, Hossam M Mahapatra, Nihar R BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes |
title | BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes |
title_full | BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes |
title_fullStr | BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes |
title_full_unstemmed | BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes |
title_short | BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes |
title_sort | bgn-score and bsn-score: bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4347622/ https://www.ncbi.nlm.nih.gov/pubmed/25734685 http://dx.doi.org/10.1186/1471-2105-16-S4-S8 |
work_keys_str_mv | AT ashtawyhossamm bgnscoreandbsnscorebaggingandboostingbasedensembleneuralnetworksscoringfunctionsforaccuratebindingaffinitypredictionofproteinligandcomplexes AT mahapatraniharr bgnscoreandbsnscorebaggingandboostingbasedensembleneuralnetworksscoringfunctionsforaccuratebindingaffinitypredictionofproteinligandcomplexes |