Cargando…

Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins

BACKGROUND: Molecular docking is a widely-employed method in structure-based drug design. An essential component of molecular docking programs is a scoring function (SF) that can be used to identify the most stable binding pose of a ligand, when bound to a receptor protein, from among a large set of...

Descripción completa

Detalles Bibliográficos
Autores principales: Ashtawy, Hossam M, Mahapatra, Nihar R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4416170/
https://www.ncbi.nlm.nih.gov/pubmed/25916860
http://dx.doi.org/10.1186/1471-2105-16-S6-S3
_version_ 1782369189677236224
author Ashtawy, Hossam M
Mahapatra, Nihar R
author_facet Ashtawy, Hossam M
Mahapatra, Nihar R
author_sort Ashtawy, Hossam M
collection PubMed
description BACKGROUND: Molecular docking is a widely-employed method in structure-based drug design. An essential component of molecular docking programs is a scoring function (SF) that can be used to identify the most stable binding pose of a ligand, when bound to a receptor protein, from among a large set of candidate poses. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited docking power (or ability to successfully identify the correct pose) has been a major impediment to cost-effective drug discovery. Therefore, in this work, we explore a range of novel SFs employing different machine-learning (ML) approaches in conjunction with physicochemical and geometrical features characterizing protein-ligand complexes to predict the native or near-native pose of a ligand docked to a receptor protein's binding site. We assess the docking accuracies of these new ML SFs as well as those of conventional SFs in the context of the 2007 PDBbind benchmark dataset on both diverse and homogeneous (protein-family-specific) test sets. Further, we perform a systematic analysis of the performance of the proposed SFs in identifying native poses of ligands that are docked to novel protein targets. RESULTS AND CONCLUSION: We find that the best performing ML SF has a success rate of 80% in identifying poses that are within 1 Å root-mean-square deviation from the native poses of 65 different protein families. This is in comparison to a success rate of only 70% achieved by the best conventional SF, ASP, employed in the commercial docking software GOLD. In addition, the proposed ML SFs perform better on novel proteins that they were never trained on before. We also observed steady gains in the performance of these scoring functions as the training set size and number of features were increased by considering more protein-ligand complexes and/or more computationally-generated poses for each complex.
format Online
Article
Text
id pubmed-4416170
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44161702015-05-07 Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins Ashtawy, Hossam M Mahapatra, Nihar R BMC Bioinformatics Research BACKGROUND: Molecular docking is a widely-employed method in structure-based drug design. An essential component of molecular docking programs is a scoring function (SF) that can be used to identify the most stable binding pose of a ligand, when bound to a receptor protein, from among a large set of candidate poses. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited docking power (or ability to successfully identify the correct pose) has been a major impediment to cost-effective drug discovery. Therefore, in this work, we explore a range of novel SFs employing different machine-learning (ML) approaches in conjunction with physicochemical and geometrical features characterizing protein-ligand complexes to predict the native or near-native pose of a ligand docked to a receptor protein's binding site. We assess the docking accuracies of these new ML SFs as well as those of conventional SFs in the context of the 2007 PDBbind benchmark dataset on both diverse and homogeneous (protein-family-specific) test sets. Further, we perform a systematic analysis of the performance of the proposed SFs in identifying native poses of ligands that are docked to novel protein targets. RESULTS AND CONCLUSION: We find that the best performing ML SF has a success rate of 80% in identifying poses that are within 1 Å root-mean-square deviation from the native poses of 65 different protein families. This is in comparison to a success rate of only 70% achieved by the best conventional SF, ASP, employed in the commercial docking software GOLD. In addition, the proposed ML SFs perform better on novel proteins that they were never trained on before. We also observed steady gains in the performance of these scoring functions as the training set size and number of features were increased by considering more protein-ligand complexes and/or more computationally-generated poses for each complex. BioMed Central 2015-04-17 /pmc/articles/PMC4416170/ /pubmed/25916860 http://dx.doi.org/10.1186/1471-2105-16-S6-S3 Text en Copyright © 2015 Ashtawy and Mahapatra; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Ashtawy, Hossam M
Mahapatra, Nihar R
Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins
title Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins
title_full Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins
title_fullStr Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins
title_full_unstemmed Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins
title_short Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins
title_sort machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4416170/
https://www.ncbi.nlm.nih.gov/pubmed/25916860
http://dx.doi.org/10.1186/1471-2105-16-S6-S3
work_keys_str_mv AT ashtawyhossamm machinelearningscoringfunctionsforidentifyingnativeposesofligandsdockedtoknownandnovelproteins
AT mahapatraniharr machinelearningscoringfunctionsforidentifyingnativeposesofligandsdockedtoknownandnovelproteins