Cargando…

Latent Biases in Machine Learning Models for Predicting Binding Affinities Using Popular Data Sets

[Image: see text] Drug design involves the process of identifying and designing molecules that bind well to a given receptor. A vital computational component of this process is the protein–ligand interaction scoring functions that evaluate the binding ability of various molecules or ligands with a g...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kanakala, Ganesh Chandan, Aggarwal, Rishal, Nayar, Divya, Priyakumar, U. Deva
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Chemical Society 2023
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9850481/ https://www.ncbi.nlm.nih.gov/pubmed/36687059 http://dx.doi.org/10.1021/acsomega.2c06781

_version_	1784872195987603456
author	Kanakala, Ganesh Chandan Aggarwal, Rishal Nayar, Divya Priyakumar, U. Deva
author_facet	Kanakala, Ganesh Chandan Aggarwal, Rishal Nayar, Divya Priyakumar, U. Deva
author_sort	Kanakala, Ganesh Chandan
collection	PubMed
description	[Image: see text] Drug design involves the process of identifying and designing molecules that bind well to a given receptor. A vital computational component of this process is the protein–ligand interaction scoring functions that evaluate the binding ability of various molecules or ligands with a given protein receptor binding pocket reasonably accurately. With the publicly available protein–ligand binding affinity data sets in both sequential and structural forms, machine learning methods have gained traction as a top choice for developing such scoring functions. While the performance shown by these models is optimistic, there are several hidden biases present in these data sets themselves that affect the utility of such models for practical purposes such as virtual screening. In this work, we use published methods to systematically investigate several such factors or biases present in these data sets. In our analysis, we highlight the importance of considering sequence, protein–ligand interaction, and pocket structure similarity while constructing data splits and provide an explanation for good protein-only and ligand-only performances in some data sets. Through this study, we provide to the community several pointers for the design of binding affinity predictors and data sets for reliable applicability.
format	Online Article Text
id	pubmed-9850481
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	American Chemical Society
record_format	MEDLINE/PubMed
spelling	pubmed-98504812023-01-20 Latent Biases in Machine Learning Models for Predicting Binding Affinities Using Popular Data Sets Kanakala, Ganesh Chandan Aggarwal, Rishal Nayar, Divya Priyakumar, U. Deva ACS Omega [Image: see text] Drug design involves the process of identifying and designing molecules that bind well to a given receptor. A vital computational component of this process is the protein–ligand interaction scoring functions that evaluate the binding ability of various molecules or ligands with a given protein receptor binding pocket reasonably accurately. With the publicly available protein–ligand binding affinity data sets in both sequential and structural forms, machine learning methods have gained traction as a top choice for developing such scoring functions. While the performance shown by these models is optimistic, there are several hidden biases present in these data sets themselves that affect the utility of such models for practical purposes such as virtual screening. In this work, we use published methods to systematically investigate several such factors or biases present in these data sets. In our analysis, we highlight the importance of considering sequence, protein–ligand interaction, and pocket structure similarity while constructing data splits and provide an explanation for good protein-only and ligand-only performances in some data sets. Through this study, we provide to the community several pointers for the design of binding affinity predictors and data sets for reliable applicability. American Chemical Society 2023-01-05 /pmc/articles/PMC9850481/ /pubmed/36687059 http://dx.doi.org/10.1021/acsomega.2c06781 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Kanakala, Ganesh Chandan Aggarwal, Rishal Nayar, Divya Priyakumar, U. Deva Latent Biases in Machine Learning Models for Predicting Binding Affinities Using Popular Data Sets
title	Latent Biases in Machine Learning Models for Predicting Binding Affinities Using Popular Data Sets
title_full	Latent Biases in Machine Learning Models for Predicting Binding Affinities Using Popular Data Sets
title_fullStr	Latent Biases in Machine Learning Models for Predicting Binding Affinities Using Popular Data Sets
title_full_unstemmed	Latent Biases in Machine Learning Models for Predicting Binding Affinities Using Popular Data Sets
title_short	Latent Biases in Machine Learning Models for Predicting Binding Affinities Using Popular Data Sets
title_sort	latent biases in machine learning models for predicting binding affinities using popular data sets
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9850481/ https://www.ncbi.nlm.nih.gov/pubmed/36687059 http://dx.doi.org/10.1021/acsomega.2c06781
work_keys_str_mv	AT kanakalaganeshchandan latentbiasesinmachinelearningmodelsforpredictingbindingaffinitiesusingpopulardatasets AT aggarwalrishal latentbiasesinmachinelearningmodelsforpredictingbindingaffinitiesusingpopulardatasets AT nayardivya latentbiasesinmachinelearningmodelsforpredictingbindingaffinitiesusingpopulardatasets AT priyakumarudeva latentbiasesinmachinelearningmodelsforpredictingbindingaffinitiesusingpopulardatasets

Latent Biases in Machine Learning Models for Predicting Binding Affinities Using Popular Data Sets

Ejemplares similares