Cargando…

Employing Molecular Conformations for Ligand-Based Virtual Screening with Equivariant Graph Neural Network and Deep Multiple Instance Learning

Ligand-based virtual screening (LBVS) is a promising approach for rapid and low-cost screening of potentially bioactive molecules in the early stage of drug discovery. Compared with traditional similarity-based machine learning methods, deep learning frameworks for LBVS can more effectively extract...

Descripción completa

Detalles Bibliográficos
Autores principales: Gu, Yaowen, Li, Jiao, Kang, Hongyu, Zhang, Bowen, Zheng, Si
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10459669/
https://www.ncbi.nlm.nih.gov/pubmed/37630234
http://dx.doi.org/10.3390/molecules28165982
_version_ 1785097467303297024
author Gu, Yaowen
Li, Jiao
Kang, Hongyu
Zhang, Bowen
Zheng, Si
author_facet Gu, Yaowen
Li, Jiao
Kang, Hongyu
Zhang, Bowen
Zheng, Si
author_sort Gu, Yaowen
collection PubMed
description Ligand-based virtual screening (LBVS) is a promising approach for rapid and low-cost screening of potentially bioactive molecules in the early stage of drug discovery. Compared with traditional similarity-based machine learning methods, deep learning frameworks for LBVS can more effectively extract high-order molecule structure representations from molecular fingerprints or structures. However, the 3D conformation of a molecule largely influences its bioactivity and physical properties, and has rarely been considered in previous deep learning-based LBVS methods. Moreover, the relative bioactivity benchmark dataset is still lacking. To address these issues, we introduce a novel end-to-end deep learning architecture trained from molecular conformers for LBVS. We first extracted molecule conformers from multiple public molecular bioactivity data and consolidated them into a large-scale bioactivity benchmark dataset, which totally includes millions of endpoints and molecules corresponding to 954 targets. Then, we devised a deep learning-based LBVS called EquiVS to learn molecule representations from conformers for bioactivity prediction. Specifically, graph convolutional network (GCN) and equivariant graph neural network (EGNN) are sequentially stacked to learn high-order molecule-level and conformer-level representations, followed with attention-based deep multiple-instance learning (MIL) to aggregate these representations and then predict the potential bioactivity for the query molecule on a given target. We conducted various experiments to validate the data quality of our benchmark dataset, and confirmed EquiVS achieved better performance compared with 10 traditional machine learning or deep learning-based LBVS methods. Further ablation studies demonstrate the significant contribution of molecular conformation for bioactivity prediction, as well as the reasonability and non-redundancy of deep learning architecture in EquiVS. Finally, a model interpretation case study on CDK2 shows the potential of EquiVS in optimal conformer discovery. The overall study shows that our proposed benchmark dataset and EquiVS method have promising prospects in virtual screening applications.
format Online
Article
Text
id pubmed-10459669
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-104596692023-08-27 Employing Molecular Conformations for Ligand-Based Virtual Screening with Equivariant Graph Neural Network and Deep Multiple Instance Learning Gu, Yaowen Li, Jiao Kang, Hongyu Zhang, Bowen Zheng, Si Molecules Article Ligand-based virtual screening (LBVS) is a promising approach for rapid and low-cost screening of potentially bioactive molecules in the early stage of drug discovery. Compared with traditional similarity-based machine learning methods, deep learning frameworks for LBVS can more effectively extract high-order molecule structure representations from molecular fingerprints or structures. However, the 3D conformation of a molecule largely influences its bioactivity and physical properties, and has rarely been considered in previous deep learning-based LBVS methods. Moreover, the relative bioactivity benchmark dataset is still lacking. To address these issues, we introduce a novel end-to-end deep learning architecture trained from molecular conformers for LBVS. We first extracted molecule conformers from multiple public molecular bioactivity data and consolidated them into a large-scale bioactivity benchmark dataset, which totally includes millions of endpoints and molecules corresponding to 954 targets. Then, we devised a deep learning-based LBVS called EquiVS to learn molecule representations from conformers for bioactivity prediction. Specifically, graph convolutional network (GCN) and equivariant graph neural network (EGNN) are sequentially stacked to learn high-order molecule-level and conformer-level representations, followed with attention-based deep multiple-instance learning (MIL) to aggregate these representations and then predict the potential bioactivity for the query molecule on a given target. We conducted various experiments to validate the data quality of our benchmark dataset, and confirmed EquiVS achieved better performance compared with 10 traditional machine learning or deep learning-based LBVS methods. Further ablation studies demonstrate the significant contribution of molecular conformation for bioactivity prediction, as well as the reasonability and non-redundancy of deep learning architecture in EquiVS. Finally, a model interpretation case study on CDK2 shows the potential of EquiVS in optimal conformer discovery. The overall study shows that our proposed benchmark dataset and EquiVS method have promising prospects in virtual screening applications. MDPI 2023-08-09 /pmc/articles/PMC10459669/ /pubmed/37630234 http://dx.doi.org/10.3390/molecules28165982 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Gu, Yaowen
Li, Jiao
Kang, Hongyu
Zhang, Bowen
Zheng, Si
Employing Molecular Conformations for Ligand-Based Virtual Screening with Equivariant Graph Neural Network and Deep Multiple Instance Learning
title Employing Molecular Conformations for Ligand-Based Virtual Screening with Equivariant Graph Neural Network and Deep Multiple Instance Learning
title_full Employing Molecular Conformations for Ligand-Based Virtual Screening with Equivariant Graph Neural Network and Deep Multiple Instance Learning
title_fullStr Employing Molecular Conformations for Ligand-Based Virtual Screening with Equivariant Graph Neural Network and Deep Multiple Instance Learning
title_full_unstemmed Employing Molecular Conformations for Ligand-Based Virtual Screening with Equivariant Graph Neural Network and Deep Multiple Instance Learning
title_short Employing Molecular Conformations for Ligand-Based Virtual Screening with Equivariant Graph Neural Network and Deep Multiple Instance Learning
title_sort employing molecular conformations for ligand-based virtual screening with equivariant graph neural network and deep multiple instance learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10459669/
https://www.ncbi.nlm.nih.gov/pubmed/37630234
http://dx.doi.org/10.3390/molecules28165982
work_keys_str_mv AT guyaowen employingmolecularconformationsforligandbasedvirtualscreeningwithequivariantgraphneuralnetworkanddeepmultipleinstancelearning
AT lijiao employingmolecularconformationsforligandbasedvirtualscreeningwithequivariantgraphneuralnetworkanddeepmultipleinstancelearning
AT kanghongyu employingmolecularconformationsforligandbasedvirtualscreeningwithequivariantgraphneuralnetworkanddeepmultipleinstancelearning
AT zhangbowen employingmolecularconformationsforligandbasedvirtualscreeningwithequivariantgraphneuralnetworkanddeepmultipleinstancelearning
AT zhengsi employingmolecularconformationsforligandbasedvirtualscreeningwithequivariantgraphneuralnetworkanddeepmultipleinstancelearning