Cargando…
Molecular Descriptors Property Prediction Using Transformer-Based Approach
In this study, we introduce semi-supervised machine learning models designed to predict molecular properties. Our model employs a two-stage approach, involving pre-training and fine-tuning. Particularly, our model leverages a substantial amount of labeled and unlabeled data consisting of SMILES stri...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10419034/ https://www.ncbi.nlm.nih.gov/pubmed/37569322 http://dx.doi.org/10.3390/ijms241511948 |
_version_ | 1785088413302521856 |
---|---|
author | Tran, Tuan Ekenna, Chinwe |
author_facet | Tran, Tuan Ekenna, Chinwe |
author_sort | Tran, Tuan |
collection | PubMed |
description | In this study, we introduce semi-supervised machine learning models designed to predict molecular properties. Our model employs a two-stage approach, involving pre-training and fine-tuning. Particularly, our model leverages a substantial amount of labeled and unlabeled data consisting of SMILES strings, a text representation system for molecules. During the pre-training stage, our model capitalizes on the Masked Language Model, which is widely used in natural language processing, for learning molecular chemical space representations. During the fine-tuning stage, our model is trained on a smaller labeled dataset to tackle specific downstream tasks, such as classification or regression. Preliminary results indicate that our model demonstrates comparable performance to state-of-the-art models on the chosen downstream tasks from MoleculeNet. Additionally, to reduce the computational overhead, we propose a new approach taking advantage of 3D compound structures for calculating the attention score used in the end-to-end transformer model to predict anti-malaria drug candidates. The results show that using the proposed attention score, our end-to-end model is able to have comparable performance with pre-trained models. |
format | Online Article Text |
id | pubmed-10419034 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-104190342023-08-12 Molecular Descriptors Property Prediction Using Transformer-Based Approach Tran, Tuan Ekenna, Chinwe Int J Mol Sci Article In this study, we introduce semi-supervised machine learning models designed to predict molecular properties. Our model employs a two-stage approach, involving pre-training and fine-tuning. Particularly, our model leverages a substantial amount of labeled and unlabeled data consisting of SMILES strings, a text representation system for molecules. During the pre-training stage, our model capitalizes on the Masked Language Model, which is widely used in natural language processing, for learning molecular chemical space representations. During the fine-tuning stage, our model is trained on a smaller labeled dataset to tackle specific downstream tasks, such as classification or regression. Preliminary results indicate that our model demonstrates comparable performance to state-of-the-art models on the chosen downstream tasks from MoleculeNet. Additionally, to reduce the computational overhead, we propose a new approach taking advantage of 3D compound structures for calculating the attention score used in the end-to-end transformer model to predict anti-malaria drug candidates. The results show that using the proposed attention score, our end-to-end model is able to have comparable performance with pre-trained models. MDPI 2023-07-26 /pmc/articles/PMC10419034/ /pubmed/37569322 http://dx.doi.org/10.3390/ijms241511948 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Tran, Tuan Ekenna, Chinwe Molecular Descriptors Property Prediction Using Transformer-Based Approach |
title | Molecular Descriptors Property Prediction Using Transformer-Based Approach |
title_full | Molecular Descriptors Property Prediction Using Transformer-Based Approach |
title_fullStr | Molecular Descriptors Property Prediction Using Transformer-Based Approach |
title_full_unstemmed | Molecular Descriptors Property Prediction Using Transformer-Based Approach |
title_short | Molecular Descriptors Property Prediction Using Transformer-Based Approach |
title_sort | molecular descriptors property prediction using transformer-based approach |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10419034/ https://www.ncbi.nlm.nih.gov/pubmed/37569322 http://dx.doi.org/10.3390/ijms241511948 |
work_keys_str_mv | AT trantuan moleculardescriptorspropertypredictionusingtransformerbasedapproach AT ekennachinwe moleculardescriptorspropertypredictionusingtransformerbasedapproach |