Cargando…

Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models

Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various prop...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jiang, Dejun, Wu, Zhenxing, Hsieh, Chang-Yu, Chen, Guangyong, Liao, Ben, Wang, Zhe, Shen, Chao, Cao, Dongsheng, Wu, Jian, Hou, Tingjun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7888189/ https://www.ncbi.nlm.nih.gov/pubmed/33597034 http://dx.doi.org/10.1186/s13321-020-00479-8

_version_	1783652119957471232
author	Jiang, Dejun Wu, Zhenxing Hsieh, Chang-Yu Chen, Guangyong Liao, Ben Wang, Zhe Shen, Chao Cao, Dongsheng Wu, Jian Hou, Tingjun
author_facet	Jiang, Dejun Wu, Zhenxing Hsieh, Chang-Yu Chen, Guangyong Liao, Ben Wang, Zhe Shen, Chao Cao, Dongsheng Wu, Jian Hou, Tingjun
author_sort	Jiang, Dejun
collection	PubMed
description	Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability. [Image: see text]
format	Online Article Text
id	pubmed-7888189
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-78881892021-02-22 Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models Jiang, Dejun Wu, Zhenxing Hsieh, Chang-Yu Chen, Guangyong Liao, Ben Wang, Zhe Shen, Chao Cao, Dongsheng Wu, Jian Hou, Tingjun J Cheminform Research Article Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability. [Image: see text] Springer International Publishing 2021-02-17 /pmc/articles/PMC7888189/ /pubmed/33597034 http://dx.doi.org/10.1186/s13321-020-00479-8 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Article Jiang, Dejun Wu, Zhenxing Hsieh, Chang-Yu Chen, Guangyong Liao, Ben Wang, Zhe Shen, Chao Cao, Dongsheng Wu, Jian Hou, Tingjun Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models
title	Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models
title_full	Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models
title_fullStr	Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models
title_full_unstemmed	Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models
title_short	Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models
title_sort	could graph neural networks learn better molecular representation for drug discovery? a comparison study of descriptor-based and graph-based models
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7888189/ https://www.ncbi.nlm.nih.gov/pubmed/33597034 http://dx.doi.org/10.1186/s13321-020-00479-8
work_keys_str_mv	AT jiangdejun couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT wuzhenxing couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT hsiehchangyu couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT chenguangyong couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT liaoben couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT wangzhe couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT shenchao couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT caodongsheng couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT wujian couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels AT houtingjun couldgraphneuralnetworkslearnbettermolecularrepresentationfordrugdiscoveryacomparisonstudyofdescriptorbasedandgraphbasedmodels

Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models

Ejemplares similares