Cargando…

Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models

Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various prop...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Dejun, Wu, Zhenxing, Hsieh, Chang-Yu, Chen, Guangyong, Liao, Ben, Wang, Zhe, Shen, Chao, Cao, Dongsheng, Wu, Jian, Hou, Tingjun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7888189/
https://www.ncbi.nlm.nih.gov/pubmed/33597034
http://dx.doi.org/10.1186/s13321-020-00479-8
_version_ 1783652119957471232
author Jiang, Dejun
Wu, Zhenxing
Hsieh, Chang-Yu
Chen, Guangyong
Liao, Ben
Wang, Zhe
Shen, Chao
Cao, Dongsheng
Wu, Jian
Hou, Tingjun
author_facet Jiang, Dejun
Wu, Zhenxing
Hsieh, Chang-Yu
Chen, Guangyong
Liao, Ben
Wang, Zhe
Shen, Chao
Cao, Dongsheng
Wu, Jian
Hou, Tingjun
author_sort Jiang, Dejun
collection PubMed
description Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability. [Image: see text]
format Online
Article
Text
id pubmed-7888189
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-78881892021-02-22 Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models Jiang, Dejun Wu, Zhenxing Hsieh, Chang-Yu Chen, Guangyong Liao, Ben Wang, Zhe Shen, Chao Cao, Dongsheng Wu, Jian Hou, Tingjun J Cheminform Research Article Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability. [Image: see text] Springer International Publishing 2021-02-17 /pmc/articles/PMC7888189/ /pubmed/33597034 http://dx.doi.org/10.1186/s13321-020-00479-8 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Jiang, Dejun
Wu, Zhenxing
Hsieh, Chang-Yu
Chen, Guangyong
Liao, Ben
Wang, Zhe
Shen, Chao
Cao, Dongsheng
Wu, Jian
Hou, Tingjun
Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models
title Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models
title_full Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models
title_fullStr Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models
title_full_unstemmed Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models
title_short Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models
title_sort could graph neural networks learn better molecular representation for drug discovery? a comparison study of descriptor-based and graph-based models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7888189/
https://www.ncbi.nlm.nih.gov/pubmed/33597034
http://dx.doi.org/10.1186/s13321-020-00479-8
work_keys_str_mv AT jiangdejun couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels
AT wuzhenxing couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels
AT hsiehchangyu couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels
AT chenguangyong couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels
AT liaoben couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels
AT wangzhe couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels
AT shenchao couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels
AT caodongsheng couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels
AT wujian couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels
AT houtingjun couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels