Cargando…
Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models
Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various prop...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7888189/ https://www.ncbi.nlm.nih.gov/pubmed/33597034 http://dx.doi.org/10.1186/s13321-020-00479-8 |
_version_ | 1783652119957471232 |
---|---|
author | Jiang, Dejun Wu, Zhenxing Hsieh, Chang-Yu Chen, Guangyong Liao, Ben Wang, Zhe Shen, Chao Cao, Dongsheng Wu, Jian Hou, Tingjun |
author_facet | Jiang, Dejun Wu, Zhenxing Hsieh, Chang-Yu Chen, Guangyong Liao, Ben Wang, Zhe Shen, Chao Cao, Dongsheng Wu, Jian Hou, Tingjun |
author_sort | Jiang, Dejun |
collection | PubMed |
description | Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability. [Image: see text] |
format | Online Article Text |
id | pubmed-7888189 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-78881892021-02-22 Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models Jiang, Dejun Wu, Zhenxing Hsieh, Chang-Yu Chen, Guangyong Liao, Ben Wang, Zhe Shen, Chao Cao, Dongsheng Wu, Jian Hou, Tingjun J Cheminform Research Article Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability. [Image: see text] Springer International Publishing 2021-02-17 /pmc/articles/PMC7888189/ /pubmed/33597034 http://dx.doi.org/10.1186/s13321-020-00479-8 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Jiang, Dejun Wu, Zhenxing Hsieh, Chang-Yu Chen, Guangyong Liao, Ben Wang, Zhe Shen, Chao Cao, Dongsheng Wu, Jian Hou, Tingjun Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models |
title | Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models |
title_full | Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models |
title_fullStr | Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models |
title_full_unstemmed | Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models |
title_short | Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models |
title_sort | could graph neural networks learn better molecular representation for drug discovery? a comparison study of descriptor-based and graph-based models |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7888189/ https://www.ncbi.nlm.nih.gov/pubmed/33597034 http://dx.doi.org/10.1186/s13321-020-00479-8 |
work_keys_str_mv | AT jiangdejun couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT wuzhenxing couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT hsiehchangyu couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT chenguangyong couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT liaoben couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT wangzhe couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT shenchao couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT caodongsheng couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT wujian couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT houtingjun couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels |