Cargando…
A systematic study of key elements underlying molecular property prediction
Artificial intelligence (AI) has been widely applied in drug discovery with a major task as molecular property prediction. Despite booming techniques in molecular representation learning, key elements underlying molecular property prediction remain largely unexplored, which impedes further advanceme...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10575948/ https://www.ncbi.nlm.nih.gov/pubmed/37833262 http://dx.doi.org/10.1038/s41467-023-41948-6 |
_version_ | 1785121021039214592 |
---|---|
author | Deng, Jianyuan Yang, Zhibo Wang, Hehe Ojima, Iwao Samaras, Dimitris Wang, Fusheng |
author_facet | Deng, Jianyuan Yang, Zhibo Wang, Hehe Ojima, Iwao Samaras, Dimitris Wang, Fusheng |
author_sort | Deng, Jianyuan |
collection | PubMed |
description | Artificial intelligence (AI) has been widely applied in drug discovery with a major task as molecular property prediction. Despite booming techniques in molecular representation learning, key elements underlying molecular property prediction remain largely unexplored, which impedes further advancements in this field. Herein, we conduct an extensive evaluation of representative models using various representations on the MoleculeNet datasets, a suite of opioids-related datasets and two additional activity datasets from the literature. To investigate the predictive power in low-data and high-data space, a series of descriptors datasets of varying sizes are also assembled to evaluate the models. In total, we have trained 62,820 models, including 50,220 models on fixed representations, 4200 models on SMILES sequences and 8400 models on molecular graphs. Based on extensive experimentation and rigorous comparison, we show that representation learning models exhibit limited performance in molecular property prediction in most datasets. Besides, multiple key elements underlying molecular property prediction can affect the evaluation results. Furthermore, we show that activity cliffs can significantly impact model prediction. Finally, we explore into potential causes why representation learning models can fail and show that dataset size is essential for representation learning models to excel. |
format | Online Article Text |
id | pubmed-10575948 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-105759482023-10-15 A systematic study of key elements underlying molecular property prediction Deng, Jianyuan Yang, Zhibo Wang, Hehe Ojima, Iwao Samaras, Dimitris Wang, Fusheng Nat Commun Article Artificial intelligence (AI) has been widely applied in drug discovery with a major task as molecular property prediction. Despite booming techniques in molecular representation learning, key elements underlying molecular property prediction remain largely unexplored, which impedes further advancements in this field. Herein, we conduct an extensive evaluation of representative models using various representations on the MoleculeNet datasets, a suite of opioids-related datasets and two additional activity datasets from the literature. To investigate the predictive power in low-data and high-data space, a series of descriptors datasets of varying sizes are also assembled to evaluate the models. In total, we have trained 62,820 models, including 50,220 models on fixed representations, 4200 models on SMILES sequences and 8400 models on molecular graphs. Based on extensive experimentation and rigorous comparison, we show that representation learning models exhibit limited performance in molecular property prediction in most datasets. Besides, multiple key elements underlying molecular property prediction can affect the evaluation results. Furthermore, we show that activity cliffs can significantly impact model prediction. Finally, we explore into potential causes why representation learning models can fail and show that dataset size is essential for representation learning models to excel. Nature Publishing Group UK 2023-10-13 /pmc/articles/PMC10575948/ /pubmed/37833262 http://dx.doi.org/10.1038/s41467-023-41948-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Deng, Jianyuan Yang, Zhibo Wang, Hehe Ojima, Iwao Samaras, Dimitris Wang, Fusheng A systematic study of key elements underlying molecular property prediction |
title | A systematic study of key elements underlying molecular property prediction |
title_full | A systematic study of key elements underlying molecular property prediction |
title_fullStr | A systematic study of key elements underlying molecular property prediction |
title_full_unstemmed | A systematic study of key elements underlying molecular property prediction |
title_short | A systematic study of key elements underlying molecular property prediction |
title_sort | systematic study of key elements underlying molecular property prediction |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10575948/ https://www.ncbi.nlm.nih.gov/pubmed/37833262 http://dx.doi.org/10.1038/s41467-023-41948-6 |
work_keys_str_mv | AT dengjianyuan asystematicstudyofkeyelementsunderlyingmolecularpropertyprediction AT yangzhibo asystematicstudyofkeyelementsunderlyingmolecularpropertyprediction AT wanghehe asystematicstudyofkeyelementsunderlyingmolecularpropertyprediction AT ojimaiwao asystematicstudyofkeyelementsunderlyingmolecularpropertyprediction AT samarasdimitris asystematicstudyofkeyelementsunderlyingmolecularpropertyprediction AT wangfusheng asystematicstudyofkeyelementsunderlyingmolecularpropertyprediction AT dengjianyuan systematicstudyofkeyelementsunderlyingmolecularpropertyprediction AT yangzhibo systematicstudyofkeyelementsunderlyingmolecularpropertyprediction AT wanghehe systematicstudyofkeyelementsunderlyingmolecularpropertyprediction AT ojimaiwao systematicstudyofkeyelementsunderlyingmolecularpropertyprediction AT samarasdimitris systematicstudyofkeyelementsunderlyingmolecularpropertyprediction AT wangfusheng systematicstudyofkeyelementsunderlyingmolecularpropertyprediction |