Cargando…

A systematic study of key elements underlying molecular property prediction

Artificial intelligence (AI) has been widely applied in drug discovery with a major task as molecular property prediction. Despite booming techniques in molecular representation learning, key elements underlying molecular property prediction remain largely unexplored, which impedes further advanceme...

Descripción completa

Detalles Bibliográficos
Autores principales: Deng, Jianyuan, Yang, Zhibo, Wang, Hehe, Ojima, Iwao, Samaras, Dimitris, Wang, Fusheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10575948/
https://www.ncbi.nlm.nih.gov/pubmed/37833262
http://dx.doi.org/10.1038/s41467-023-41948-6
_version_ 1785121021039214592
author Deng, Jianyuan
Yang, Zhibo
Wang, Hehe
Ojima, Iwao
Samaras, Dimitris
Wang, Fusheng
author_facet Deng, Jianyuan
Yang, Zhibo
Wang, Hehe
Ojima, Iwao
Samaras, Dimitris
Wang, Fusheng
author_sort Deng, Jianyuan
collection PubMed
description Artificial intelligence (AI) has been widely applied in drug discovery with a major task as molecular property prediction. Despite booming techniques in molecular representation learning, key elements underlying molecular property prediction remain largely unexplored, which impedes further advancements in this field. Herein, we conduct an extensive evaluation of representative models using various representations on the MoleculeNet datasets, a suite of opioids-related datasets and two additional activity datasets from the literature. To investigate the predictive power in low-data and high-data space, a series of descriptors datasets of varying sizes are also assembled to evaluate the models. In total, we have trained 62,820 models, including 50,220 models on fixed representations, 4200 models on SMILES sequences and 8400 models on molecular graphs. Based on extensive experimentation and rigorous comparison, we show that representation learning models exhibit limited performance in molecular property prediction in most datasets. Besides, multiple key elements underlying molecular property prediction can affect the evaluation results. Furthermore, we show that activity cliffs can significantly impact model prediction. Finally, we explore into potential causes why representation learning models can fail and show that dataset size is essential for representation learning models to excel.
format Online
Article
Text
id pubmed-10575948
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-105759482023-10-15 A systematic study of key elements underlying molecular property prediction Deng, Jianyuan Yang, Zhibo Wang, Hehe Ojima, Iwao Samaras, Dimitris Wang, Fusheng Nat Commun Article Artificial intelligence (AI) has been widely applied in drug discovery with a major task as molecular property prediction. Despite booming techniques in molecular representation learning, key elements underlying molecular property prediction remain largely unexplored, which impedes further advancements in this field. Herein, we conduct an extensive evaluation of representative models using various representations on the MoleculeNet datasets, a suite of opioids-related datasets and two additional activity datasets from the literature. To investigate the predictive power in low-data and high-data space, a series of descriptors datasets of varying sizes are also assembled to evaluate the models. In total, we have trained 62,820 models, including 50,220 models on fixed representations, 4200 models on SMILES sequences and 8400 models on molecular graphs. Based on extensive experimentation and rigorous comparison, we show that representation learning models exhibit limited performance in molecular property prediction in most datasets. Besides, multiple key elements underlying molecular property prediction can affect the evaluation results. Furthermore, we show that activity cliffs can significantly impact model prediction. Finally, we explore into potential causes why representation learning models can fail and show that dataset size is essential for representation learning models to excel. Nature Publishing Group UK 2023-10-13 /pmc/articles/PMC10575948/ /pubmed/37833262 http://dx.doi.org/10.1038/s41467-023-41948-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Deng, Jianyuan
Yang, Zhibo
Wang, Hehe
Ojima, Iwao
Samaras, Dimitris
Wang, Fusheng
A systematic study of key elements underlying molecular property prediction
title A systematic study of key elements underlying molecular property prediction
title_full A systematic study of key elements underlying molecular property prediction
title_fullStr A systematic study of key elements underlying molecular property prediction
title_full_unstemmed A systematic study of key elements underlying molecular property prediction
title_short A systematic study of key elements underlying molecular property prediction
title_sort systematic study of key elements underlying molecular property prediction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10575948/
https://www.ncbi.nlm.nih.gov/pubmed/37833262
http://dx.doi.org/10.1038/s41467-023-41948-6
work_keys_str_mv AT dengjianyuan asystematicstudyofkeyelementsunderlyingmolecularpropertyprediction
AT yangzhibo asystematicstudyofkeyelementsunderlyingmolecularpropertyprediction
AT wanghehe asystematicstudyofkeyelementsunderlyingmolecularpropertyprediction
AT ojimaiwao asystematicstudyofkeyelementsunderlyingmolecularpropertyprediction
AT samarasdimitris asystematicstudyofkeyelementsunderlyingmolecularpropertyprediction
AT wangfusheng asystematicstudyofkeyelementsunderlyingmolecularpropertyprediction
AT dengjianyuan systematicstudyofkeyelementsunderlyingmolecularpropertyprediction
AT yangzhibo systematicstudyofkeyelementsunderlyingmolecularpropertyprediction
AT wanghehe systematicstudyofkeyelementsunderlyingmolecularpropertyprediction
AT ojimaiwao systematicstudyofkeyelementsunderlyingmolecularpropertyprediction
AT samarasdimitris systematicstudyofkeyelementsunderlyingmolecularpropertyprediction
AT wangfusheng systematicstudyofkeyelementsunderlyingmolecularpropertyprediction