Cargando…
Latent Biases in Machine Learning Models for Predicting Binding Affinities Using Popular Data Sets
[Image: see text] Drug design involves the process of identifying and designing molecules that bind well to a given receptor. A vital computational component of this process is the protein–ligand interaction scoring functions that evaluate the binding ability of various molecules or ligands with a g...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9850481/ https://www.ncbi.nlm.nih.gov/pubmed/36687059 http://dx.doi.org/10.1021/acsomega.2c06781 |
_version_ | 1784872195987603456 |
---|---|
author | Kanakala, Ganesh Chandan Aggarwal, Rishal Nayar, Divya Priyakumar, U. Deva |
author_facet | Kanakala, Ganesh Chandan Aggarwal, Rishal Nayar, Divya Priyakumar, U. Deva |
author_sort | Kanakala, Ganesh Chandan |
collection | PubMed |
description | [Image: see text] Drug design involves the process of identifying and designing molecules that bind well to a given receptor. A vital computational component of this process is the protein–ligand interaction scoring functions that evaluate the binding ability of various molecules or ligands with a given protein receptor binding pocket reasonably accurately. With the publicly available protein–ligand binding affinity data sets in both sequential and structural forms, machine learning methods have gained traction as a top choice for developing such scoring functions. While the performance shown by these models is optimistic, there are several hidden biases present in these data sets themselves that affect the utility of such models for practical purposes such as virtual screening. In this work, we use published methods to systematically investigate several such factors or biases present in these data sets. In our analysis, we highlight the importance of considering sequence, protein–ligand interaction, and pocket structure similarity while constructing data splits and provide an explanation for good protein-only and ligand-only performances in some data sets. Through this study, we provide to the community several pointers for the design of binding affinity predictors and data sets for reliable applicability. |
format | Online Article Text |
id | pubmed-9850481 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-98504812023-01-20 Latent Biases in Machine Learning Models for Predicting Binding Affinities Using Popular Data Sets Kanakala, Ganesh Chandan Aggarwal, Rishal Nayar, Divya Priyakumar, U. Deva ACS Omega [Image: see text] Drug design involves the process of identifying and designing molecules that bind well to a given receptor. A vital computational component of this process is the protein–ligand interaction scoring functions that evaluate the binding ability of various molecules or ligands with a given protein receptor binding pocket reasonably accurately. With the publicly available protein–ligand binding affinity data sets in both sequential and structural forms, machine learning methods have gained traction as a top choice for developing such scoring functions. While the performance shown by these models is optimistic, there are several hidden biases present in these data sets themselves that affect the utility of such models for practical purposes such as virtual screening. In this work, we use published methods to systematically investigate several such factors or biases present in these data sets. In our analysis, we highlight the importance of considering sequence, protein–ligand interaction, and pocket structure similarity while constructing data splits and provide an explanation for good protein-only and ligand-only performances in some data sets. Through this study, we provide to the community several pointers for the design of binding affinity predictors and data sets for reliable applicability. American Chemical Society 2023-01-05 /pmc/articles/PMC9850481/ /pubmed/36687059 http://dx.doi.org/10.1021/acsomega.2c06781 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Kanakala, Ganesh Chandan Aggarwal, Rishal Nayar, Divya Priyakumar, U. Deva Latent Biases in Machine Learning Models for Predicting Binding Affinities Using Popular Data Sets |
title | Latent Biases in Machine Learning Models for Predicting
Binding Affinities Using Popular Data Sets |
title_full | Latent Biases in Machine Learning Models for Predicting
Binding Affinities Using Popular Data Sets |
title_fullStr | Latent Biases in Machine Learning Models for Predicting
Binding Affinities Using Popular Data Sets |
title_full_unstemmed | Latent Biases in Machine Learning Models for Predicting
Binding Affinities Using Popular Data Sets |
title_short | Latent Biases in Machine Learning Models for Predicting
Binding Affinities Using Popular Data Sets |
title_sort | latent biases in machine learning models for predicting
binding affinities using popular data sets |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9850481/ https://www.ncbi.nlm.nih.gov/pubmed/36687059 http://dx.doi.org/10.1021/acsomega.2c06781 |
work_keys_str_mv | AT kanakalaganeshchandan latentbiasesinmachinelearningmodelsforpredictingbindingaffinitiesusingpopulardatasets AT aggarwalrishal latentbiasesinmachinelearningmodelsforpredictingbindingaffinitiesusingpopulardatasets AT nayardivya latentbiasesinmachinelearningmodelsforpredictingbindingaffinitiesusingpopulardatasets AT priyakumarudeva latentbiasesinmachinelearningmodelsforpredictingbindingaffinitiesusingpopulardatasets |