Cargando…

SMG-BERT: integrating stereoscopic information and chemical representation for molecular property prediction

Molecular property prediction is a crucial task in various fields and has recently garnered significant attention. To achieve accurate and fast prediction of molecular properties, machine learning (ML) models have been widely employed due to their superior performance compared to traditional methods...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Jiahui, Du, Wenjie, Yang, Xiaoting, Wu, Di, Li, Jiahe, Wang, Kun, Wang, Yang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10348360/
https://www.ncbi.nlm.nih.gov/pubmed/37457837
http://dx.doi.org/10.3389/fmolb.2023.1216765
_version_ 1785073653889630208
author Zhang, Jiahui
Du, Wenjie
Yang, Xiaoting
Wu, Di
Li, Jiahe
Wang, Kun
Wang, Yang
author_facet Zhang, Jiahui
Du, Wenjie
Yang, Xiaoting
Wu, Di
Li, Jiahe
Wang, Kun
Wang, Yang
author_sort Zhang, Jiahui
collection PubMed
description Molecular property prediction is a crucial task in various fields and has recently garnered significant attention. To achieve accurate and fast prediction of molecular properties, machine learning (ML) models have been widely employed due to their superior performance compared to traditional methods by trial and error. However, most of the existing ML models that do not incorporate 3D molecular information are still in need of improvement, as they are mostly poor at differentiating stereoisomers of certain types, particularly chiral ones. Also,routine featurization methods using only incomplete features are hard to obtain explicable molecular representations. In this paper, we propose the Stereo Molecular Graph BERT (SMG-BERT) by integrating the 3D space geometric parameters, 2D topological information, and 1D SMILES string into the self-attention-based BERT model. In addition, nuclear magnetic resonance (NMR) spectroscopy results and bond dissociation energy (BDE) are integrated as extra atomic and bond features to improve the model’s performance and interpretability analysis. The comprehensive integration of 1D, 2D, and 3D information could establish a unified and unambiguous molecular characterization system to distinguish conformations, such as chiral molecules. Intuitively integrated chemical information enables the model to possess interpretability that is consistent with chemical logic. Experimental results on 12 benchmark molecular datasets show that SMG-BERT consistently outperforms existing methods. At the same time, the experimental results demonstrate that SMG-BERT is generalizable and reliable.
format Online
Article
Text
id pubmed-10348360
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-103483602023-07-15 SMG-BERT: integrating stereoscopic information and chemical representation for molecular property prediction Zhang, Jiahui Du, Wenjie Yang, Xiaoting Wu, Di Li, Jiahe Wang, Kun Wang, Yang Front Mol Biosci Molecular Biosciences Molecular property prediction is a crucial task in various fields and has recently garnered significant attention. To achieve accurate and fast prediction of molecular properties, machine learning (ML) models have been widely employed due to their superior performance compared to traditional methods by trial and error. However, most of the existing ML models that do not incorporate 3D molecular information are still in need of improvement, as they are mostly poor at differentiating stereoisomers of certain types, particularly chiral ones. Also,routine featurization methods using only incomplete features are hard to obtain explicable molecular representations. In this paper, we propose the Stereo Molecular Graph BERT (SMG-BERT) by integrating the 3D space geometric parameters, 2D topological information, and 1D SMILES string into the self-attention-based BERT model. In addition, nuclear magnetic resonance (NMR) spectroscopy results and bond dissociation energy (BDE) are integrated as extra atomic and bond features to improve the model’s performance and interpretability analysis. The comprehensive integration of 1D, 2D, and 3D information could establish a unified and unambiguous molecular characterization system to distinguish conformations, such as chiral molecules. Intuitively integrated chemical information enables the model to possess interpretability that is consistent with chemical logic. Experimental results on 12 benchmark molecular datasets show that SMG-BERT consistently outperforms existing methods. At the same time, the experimental results demonstrate that SMG-BERT is generalizable and reliable. Frontiers Media S.A. 2023-06-30 /pmc/articles/PMC10348360/ /pubmed/37457837 http://dx.doi.org/10.3389/fmolb.2023.1216765 Text en Copyright © 2023 Zhang, Du, Yang, Wu, Li, Wang and Wang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Molecular Biosciences
Zhang, Jiahui
Du, Wenjie
Yang, Xiaoting
Wu, Di
Li, Jiahe
Wang, Kun
Wang, Yang
SMG-BERT: integrating stereoscopic information and chemical representation for molecular property prediction
title SMG-BERT: integrating stereoscopic information and chemical representation for molecular property prediction
title_full SMG-BERT: integrating stereoscopic information and chemical representation for molecular property prediction
title_fullStr SMG-BERT: integrating stereoscopic information and chemical representation for molecular property prediction
title_full_unstemmed SMG-BERT: integrating stereoscopic information and chemical representation for molecular property prediction
title_short SMG-BERT: integrating stereoscopic information and chemical representation for molecular property prediction
title_sort smg-bert: integrating stereoscopic information and chemical representation for molecular property prediction
topic Molecular Biosciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10348360/
https://www.ncbi.nlm.nih.gov/pubmed/37457837
http://dx.doi.org/10.3389/fmolb.2023.1216765
work_keys_str_mv AT zhangjiahui smgbertintegratingstereoscopicinformationandchemicalrepresentationformolecularpropertyprediction
AT duwenjie smgbertintegratingstereoscopicinformationandchemicalrepresentationformolecularpropertyprediction
AT yangxiaoting smgbertintegratingstereoscopicinformationandchemicalrepresentationformolecularpropertyprediction
AT wudi smgbertintegratingstereoscopicinformationandchemicalrepresentationformolecularpropertyprediction
AT lijiahe smgbertintegratingstereoscopicinformationandchemicalrepresentationformolecularpropertyprediction
AT wangkun smgbertintegratingstereoscopicinformationandchemicalrepresentationformolecularpropertyprediction
AT wangyang smgbertintegratingstereoscopicinformationandchemicalrepresentationformolecularpropertyprediction