Cargando…
Molecular Descriptors, Structure Generation, and Inverse QSAR/QSPR Based on SELFIES
[Image: see text] For inverse QSAR/QSPR in conventional molecular design, several chemical structures must be generated and their molecular descriptors must be calculated. However, there is no one-to-one correspondence between the generated chemical structures and molecular descriptors. In this pape...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2023
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10286088/ https://www.ncbi.nlm.nih.gov/pubmed/37360490 http://dx.doi.org/10.1021/acsomega.3c01332 |
_version_ | 1785061680022028288 |
---|---|
author | Kaneko, Hiromasa |
author_facet | Kaneko, Hiromasa |
author_sort | Kaneko, Hiromasa |
collection | PubMed |
description | [Image: see text] For inverse QSAR/QSPR in conventional molecular design, several chemical structures must be generated and their molecular descriptors must be calculated. However, there is no one-to-one correspondence between the generated chemical structures and molecular descriptors. In this paper, molecular descriptors, structure generation, and inverse QSAR/QSPR based on self-referencing embedded strings (SELFIES), a 100% robust molecular string representation, are proposed. A one-hot vector is converted from SELFIES to SELFIES descriptors x, and an inverse analysis of the QSAR/QSPR model y = f(x) with the objective variable y and molecular descriptor x is conducted. Thus, x values that achieve a target y value are obtained. Based on these values, SELFIES strings or molecules are generated, meaning that inverse QSAR/QSPR is performed successfully. The SELFIES descriptors and SELFIES-based structure generation are verified using datasets of actual compounds. The successful construction of SELFIES-descriptor-based QSAR/QSPR models with predictive abilities comparable to those of models based on other fingerprints is confirmed. A large number of molecules with one-to-one relationships with the values of the SELFIES descriptors are generated. Furthermore, as a case study of inverse QSAR/QSPR, molecules with target y values are generated successfully. The Python code for the proposed method is available at https://github.com/hkaneko1985/dcekit. |
format | Online Article Text |
id | pubmed-10286088 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-102860882023-06-23 Molecular Descriptors, Structure Generation, and Inverse QSAR/QSPR Based on SELFIES Kaneko, Hiromasa ACS Omega [Image: see text] For inverse QSAR/QSPR in conventional molecular design, several chemical structures must be generated and their molecular descriptors must be calculated. However, there is no one-to-one correspondence between the generated chemical structures and molecular descriptors. In this paper, molecular descriptors, structure generation, and inverse QSAR/QSPR based on self-referencing embedded strings (SELFIES), a 100% robust molecular string representation, are proposed. A one-hot vector is converted from SELFIES to SELFIES descriptors x, and an inverse analysis of the QSAR/QSPR model y = f(x) with the objective variable y and molecular descriptor x is conducted. Thus, x values that achieve a target y value are obtained. Based on these values, SELFIES strings or molecules are generated, meaning that inverse QSAR/QSPR is performed successfully. The SELFIES descriptors and SELFIES-based structure generation are verified using datasets of actual compounds. The successful construction of SELFIES-descriptor-based QSAR/QSPR models with predictive abilities comparable to those of models based on other fingerprints is confirmed. A large number of molecules with one-to-one relationships with the values of the SELFIES descriptors are generated. Furthermore, as a case study of inverse QSAR/QSPR, molecules with target y values are generated successfully. The Python code for the proposed method is available at https://github.com/hkaneko1985/dcekit. American Chemical Society 2023-06-05 /pmc/articles/PMC10286088/ /pubmed/37360490 http://dx.doi.org/10.1021/acsomega.3c01332 Text en © 2023 The Author. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Kaneko, Hiromasa Molecular Descriptors, Structure Generation, and Inverse QSAR/QSPR Based on SELFIES |
title | Molecular Descriptors,
Structure Generation, and Inverse
QSAR/QSPR Based on SELFIES |
title_full | Molecular Descriptors,
Structure Generation, and Inverse
QSAR/QSPR Based on SELFIES |
title_fullStr | Molecular Descriptors,
Structure Generation, and Inverse
QSAR/QSPR Based on SELFIES |
title_full_unstemmed | Molecular Descriptors,
Structure Generation, and Inverse
QSAR/QSPR Based on SELFIES |
title_short | Molecular Descriptors,
Structure Generation, and Inverse
QSAR/QSPR Based on SELFIES |
title_sort | molecular descriptors,
structure generation, and inverse
qsar/qspr based on selfies |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10286088/ https://www.ncbi.nlm.nih.gov/pubmed/37360490 http://dx.doi.org/10.1021/acsomega.3c01332 |
work_keys_str_mv | AT kanekohiromasa moleculardescriptorsstructuregenerationandinverseqsarqsprbasedonselfies |