Cargando…

A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation

Deep learning has brought a rapid development in the aspect of molecular representation for various tasks, such as molecular property prediction. The prediction of molecular properties is a crucial task in the field of drug discovery for finding specific drugs with good pharmacological activity and...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Chunyan, Feng, Jihua, Liu, Shihu, Yao, Junfeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8843876/
https://www.ncbi.nlm.nih.gov/pubmed/35178082
http://dx.doi.org/10.1155/2022/8464452
_version_ 1784651360394805248
author Li, Chunyan
Feng, Jihua
Liu, Shihu
Yao, Junfeng
author_facet Li, Chunyan
Feng, Jihua
Liu, Shihu
Yao, Junfeng
author_sort Li, Chunyan
collection PubMed
description Deep learning has brought a rapid development in the aspect of molecular representation for various tasks, such as molecular property prediction. The prediction of molecular properties is a crucial task in the field of drug discovery for finding specific drugs with good pharmacological activity and pharmacokinetic properties. SMILES string is always used as a kind of character approach in deep neural network models, inspired by natural language processing techniques. However, the deep learning models are hindered by the nonunique nature of the SMILES string. To efficiently learn molecular features along all message paths, in this paper we encode multiple SMILES for every molecule as an automated data augmentation for the prediction of molecular properties, which alleviates the overfitting problem caused by the small amount of data in the datasets of molecular property prediction. As a result, by using the multiple SMILES-based augmentation, we obtained better molecular representation and showed superior performance in the tasks of predicting molecular properties.
format Online
Article
Text
id pubmed-8843876
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-88438762022-02-16 A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation Li, Chunyan Feng, Jihua Liu, Shihu Yao, Junfeng Comput Intell Neurosci Research Article Deep learning has brought a rapid development in the aspect of molecular representation for various tasks, such as molecular property prediction. The prediction of molecular properties is a crucial task in the field of drug discovery for finding specific drugs with good pharmacological activity and pharmacokinetic properties. SMILES string is always used as a kind of character approach in deep neural network models, inspired by natural language processing techniques. However, the deep learning models are hindered by the nonunique nature of the SMILES string. To efficiently learn molecular features along all message paths, in this paper we encode multiple SMILES for every molecule as an automated data augmentation for the prediction of molecular properties, which alleviates the overfitting problem caused by the small amount of data in the datasets of molecular property prediction. As a result, by using the multiple SMILES-based augmentation, we obtained better molecular representation and showed superior performance in the tasks of predicting molecular properties. Hindawi 2022-01-28 /pmc/articles/PMC8843876/ /pubmed/35178082 http://dx.doi.org/10.1155/2022/8464452 Text en Copyright © 2022 Chunyan Li et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Li, Chunyan
Feng, Jihua
Liu, Shihu
Yao, Junfeng
A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation
title A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation
title_full A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation
title_fullStr A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation
title_full_unstemmed A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation
title_short A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation
title_sort novel molecular representation learning for molecular property prediction with a multiple smiles-based augmentation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8843876/
https://www.ncbi.nlm.nih.gov/pubmed/35178082
http://dx.doi.org/10.1155/2022/8464452
work_keys_str_mv AT lichunyan anovelmolecularrepresentationlearningformolecularpropertypredictionwithamultiplesmilesbasedaugmentation
AT fengjihua anovelmolecularrepresentationlearningformolecularpropertypredictionwithamultiplesmilesbasedaugmentation
AT liushihu anovelmolecularrepresentationlearningformolecularpropertypredictionwithamultiplesmilesbasedaugmentation
AT yaojunfeng anovelmolecularrepresentationlearningformolecularpropertypredictionwithamultiplesmilesbasedaugmentation
AT lichunyan novelmolecularrepresentationlearningformolecularpropertypredictionwithamultiplesmilesbasedaugmentation
AT fengjihua novelmolecularrepresentationlearningformolecularpropertypredictionwithamultiplesmilesbasedaugmentation
AT liushihu novelmolecularrepresentationlearningformolecularpropertypredictionwithamultiplesmilesbasedaugmentation
AT yaojunfeng novelmolecularrepresentationlearningformolecularpropertypredictionwithamultiplesmilesbasedaugmentation