Cargando…
Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest
Docking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in suppor...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6272292/ https://www.ncbi.nlm.nih.gov/pubmed/26076113 http://dx.doi.org/10.3390/molecules200610947 |
_version_ | 1783377121164394496 |
---|---|
author | Li, Hongjian Leung, Kwong-Sak Wong, Man-Hon Ballester, Pedro J. |
author_facet | Li, Hongjian Leung, Kwong-Sak Wong, Man-Hon Ballester, Pedro J. |
author_sort | Li, Hongjian |
collection | PubMed |
description | Docking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in support of this hypothesis. In this study, we investigated to which extent training a scoring function with data containing low-quality structural and binding data is detrimental for predictive performance. We actually found that low-quality data is not only non-detrimental, but beneficial for the predictive performance of machine-learning scoring functions, though the improvement is less important than that coming from high-quality data. Furthermore, we observed that classical scoring functions are not able to effectively exploit data beyond an early threshold, regardless of its quality. This demonstrates that exploiting a larger data volume is more important for the performance of machine-learning scoring functions than restricting to a smaller set of higher data quality. |
format | Online Article Text |
id | pubmed-6272292 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-62722922018-12-31 Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest Li, Hongjian Leung, Kwong-Sak Wong, Man-Hon Ballester, Pedro J. Molecules Article Docking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in support of this hypothesis. In this study, we investigated to which extent training a scoring function with data containing low-quality structural and binding data is detrimental for predictive performance. We actually found that low-quality data is not only non-detrimental, but beneficial for the predictive performance of machine-learning scoring functions, though the improvement is less important than that coming from high-quality data. Furthermore, we observed that classical scoring functions are not able to effectively exploit data beyond an early threshold, regardless of its quality. This demonstrates that exploiting a larger data volume is more important for the performance of machine-learning scoring functions than restricting to a smaller set of higher data quality. MDPI 2015-06-12 /pmc/articles/PMC6272292/ /pubmed/26076113 http://dx.doi.org/10.3390/molecules200610947 Text en © 2015 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Li, Hongjian Leung, Kwong-Sak Wong, Man-Hon Ballester, Pedro J. Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest |
title | Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest |
title_full | Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest |
title_fullStr | Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest |
title_full_unstemmed | Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest |
title_short | Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest |
title_sort | low-quality structural and interaction data improves binding affinity prediction via random forest |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6272292/ https://www.ncbi.nlm.nih.gov/pubmed/26076113 http://dx.doi.org/10.3390/molecules200610947 |
work_keys_str_mv | AT lihongjian lowqualitystructuralandinteractiondataimprovesbindingaffinitypredictionviarandomforest AT leungkwongsak lowqualitystructuralandinteractiondataimprovesbindingaffinitypredictionviarandomforest AT wongmanhon lowqualitystructuralandinteractiondataimprovesbindingaffinitypredictionviarandomforest AT ballesterpedroj lowqualitystructuralandinteractiondataimprovesbindingaffinitypredictionviarandomforest |