Cargando…

Machine-Learning-Based Data Analysis Method for Cell-Based Selection of DNA-Encoded Libraries

[Image: see text] DNA-encoded library (DEL) is a powerful ligand discovery technology that has been widely adopted in the pharmaceutical industry. DEL selections are typically performed with a purified protein target immobilized on a matrix or in solution phase. Recently, DELs have also been used to...

Descripción completa

Detalles Bibliográficos
Autores principales: Hou, Rui, Xie, Chao, Gui, Yuhan, Li, Gang, Li, Xiaoyu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10233830/
https://www.ncbi.nlm.nih.gov/pubmed/37273617
http://dx.doi.org/10.1021/acsomega.3c02152
_version_ 1785052345130811392
author Hou, Rui
Xie, Chao
Gui, Yuhan
Li, Gang
Li, Xiaoyu
author_facet Hou, Rui
Xie, Chao
Gui, Yuhan
Li, Gang
Li, Xiaoyu
author_sort Hou, Rui
collection PubMed
description [Image: see text] DNA-encoded library (DEL) is a powerful ligand discovery technology that has been widely adopted in the pharmaceutical industry. DEL selections are typically performed with a purified protein target immobilized on a matrix or in solution phase. Recently, DELs have also been used to interrogate the targets in the complex biological environment, such as membrane proteins on live cells. However, due to the complex landscape of the cell surface, the selection inevitably involves significant nonspecific interactions, and the selection data are much noisier than the ones with purified proteins, making reliable hit identification highly challenging. Researchers have developed several approaches to denoise DEL datasets, but it remains unclear whether they are suitable for cell-based DEL selections. Here, we report the proof-of-principle of a new machine-learning (ML)-based approach to process cell-based DEL selection datasets by using a Maximum A Posteriori (MAP) estimation loss function, a probabilistic framework that can account for and quantify uncertainties of noisy data. We applied the approach to a DEL selection dataset, where a library of 7,721,415 compounds was selected against a purified carbonic anhydrase 2 (CA-2) and a cell line expressing the membrane protein carbonic anhydrase 12 (CA-12). The extended-connectivity fingerprint (ECFP)-based regression model using the MAP loss function was able to identify true binders and also reliable structure–activity relationship (SAR) from the noisy cell-based selection datasets. In addition, the regularized enrichment metric (known as MAP enrichment) could also be calculated directly without involving the specific machine-learning model, effectively suppressing low-confidence outliers and enhancing the signal-to-noise ratio. Future applications of this method will focus on de novo ligand discovery from cell-based DEL selections.
format Online
Article
Text
id pubmed-10233830
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-102338302023-06-02 Machine-Learning-Based Data Analysis Method for Cell-Based Selection of DNA-Encoded Libraries Hou, Rui Xie, Chao Gui, Yuhan Li, Gang Li, Xiaoyu ACS Omega [Image: see text] DNA-encoded library (DEL) is a powerful ligand discovery technology that has been widely adopted in the pharmaceutical industry. DEL selections are typically performed with a purified protein target immobilized on a matrix or in solution phase. Recently, DELs have also been used to interrogate the targets in the complex biological environment, such as membrane proteins on live cells. However, due to the complex landscape of the cell surface, the selection inevitably involves significant nonspecific interactions, and the selection data are much noisier than the ones with purified proteins, making reliable hit identification highly challenging. Researchers have developed several approaches to denoise DEL datasets, but it remains unclear whether they are suitable for cell-based DEL selections. Here, we report the proof-of-principle of a new machine-learning (ML)-based approach to process cell-based DEL selection datasets by using a Maximum A Posteriori (MAP) estimation loss function, a probabilistic framework that can account for and quantify uncertainties of noisy data. We applied the approach to a DEL selection dataset, where a library of 7,721,415 compounds was selected against a purified carbonic anhydrase 2 (CA-2) and a cell line expressing the membrane protein carbonic anhydrase 12 (CA-12). The extended-connectivity fingerprint (ECFP)-based regression model using the MAP loss function was able to identify true binders and also reliable structure–activity relationship (SAR) from the noisy cell-based selection datasets. In addition, the regularized enrichment metric (known as MAP enrichment) could also be calculated directly without involving the specific machine-learning model, effectively suppressing low-confidence outliers and enhancing the signal-to-noise ratio. Future applications of this method will focus on de novo ligand discovery from cell-based DEL selections. American Chemical Society 2023-05-15 /pmc/articles/PMC10233830/ /pubmed/37273617 http://dx.doi.org/10.1021/acsomega.3c02152 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Hou, Rui
Xie, Chao
Gui, Yuhan
Li, Gang
Li, Xiaoyu
Machine-Learning-Based Data Analysis Method for Cell-Based Selection of DNA-Encoded Libraries
title Machine-Learning-Based Data Analysis Method for Cell-Based Selection of DNA-Encoded Libraries
title_full Machine-Learning-Based Data Analysis Method for Cell-Based Selection of DNA-Encoded Libraries
title_fullStr Machine-Learning-Based Data Analysis Method for Cell-Based Selection of DNA-Encoded Libraries
title_full_unstemmed Machine-Learning-Based Data Analysis Method for Cell-Based Selection of DNA-Encoded Libraries
title_short Machine-Learning-Based Data Analysis Method for Cell-Based Selection of DNA-Encoded Libraries
title_sort machine-learning-based data analysis method for cell-based selection of dna-encoded libraries
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10233830/
https://www.ncbi.nlm.nih.gov/pubmed/37273617
http://dx.doi.org/10.1021/acsomega.3c02152
work_keys_str_mv AT hourui machinelearningbaseddataanalysismethodforcellbasedselectionofdnaencodedlibraries
AT xiechao machinelearningbaseddataanalysismethodforcellbasedselectionofdnaencodedlibraries
AT guiyuhan machinelearningbaseddataanalysismethodforcellbasedselectionofdnaencodedlibraries
AT ligang machinelearningbaseddataanalysismethodforcellbasedselectionofdnaencodedlibraries
AT lixiaoyu machinelearningbaseddataanalysismethodforcellbasedselectionofdnaencodedlibraries