Cargando…
Reconstruction of lossless molecular representations from fingerprints
The simplified molecular-input line-entry system (SMILES) is the most prevalent molecular representation used in AI-based chemical applications. However, there are innate limitations associated with the internal structure of SMILES representations. In this context, this study exploits the resolution...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9948316/ https://www.ncbi.nlm.nih.gov/pubmed/36823647 http://dx.doi.org/10.1186/s13321-023-00693-0 |
_version_ | 1784892755430866944 |
---|---|
author | Ucak, Umit V. Ashyrmamatov, Islambek Lee, Juyong |
author_facet | Ucak, Umit V. Ashyrmamatov, Islambek Lee, Juyong |
author_sort | Ucak, Umit V. |
collection | PubMed |
description | The simplified molecular-input line-entry system (SMILES) is the most prevalent molecular representation used in AI-based chemical applications. However, there are innate limitations associated with the internal structure of SMILES representations. In this context, this study exploits the resolution and robustness of unique molecular representations, i.e., SMILES and SELFIES (SELF-referencIng Embedded strings), reconstructed from a set of structural fingerprints, which are proposed and used herein as vital representational tools for chemical and natural language processing (NLP) applications. This is achieved by restoring the connectivity information lost during fingerprint transformation with high accuracy. Notably, the results reveal that seemingly irreversible molecule-to-fingerprint conversion is feasible. More specifically, four structural fingerprints, extended connectivity, topological torsion, atom pairs, and atomic environments can be used as inputs and outputs of chemical NLP applications. Therefore, this comprehensive study addresses the major limitation of structural fingerprints that precludes their use in NLP models. Our findings will facilitate the development of text- or fingerprint-based chemoinformatic models for generative and translational tasks. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-023-00693-0. |
format | Online Article Text |
id | pubmed-9948316 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-99483162023-02-24 Reconstruction of lossless molecular representations from fingerprints Ucak, Umit V. Ashyrmamatov, Islambek Lee, Juyong J Cheminform Research The simplified molecular-input line-entry system (SMILES) is the most prevalent molecular representation used in AI-based chemical applications. However, there are innate limitations associated with the internal structure of SMILES representations. In this context, this study exploits the resolution and robustness of unique molecular representations, i.e., SMILES and SELFIES (SELF-referencIng Embedded strings), reconstructed from a set of structural fingerprints, which are proposed and used herein as vital representational tools for chemical and natural language processing (NLP) applications. This is achieved by restoring the connectivity information lost during fingerprint transformation with high accuracy. Notably, the results reveal that seemingly irreversible molecule-to-fingerprint conversion is feasible. More specifically, four structural fingerprints, extended connectivity, topological torsion, atom pairs, and atomic environments can be used as inputs and outputs of chemical NLP applications. Therefore, this comprehensive study addresses the major limitation of structural fingerprints that precludes their use in NLP models. Our findings will facilitate the development of text- or fingerprint-based chemoinformatic models for generative and translational tasks. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-023-00693-0. Springer International Publishing 2023-02-23 /pmc/articles/PMC9948316/ /pubmed/36823647 http://dx.doi.org/10.1186/s13321-023-00693-0 Text en © The Author(s) 2023, corrected publication 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Ucak, Umit V. Ashyrmamatov, Islambek Lee, Juyong Reconstruction of lossless molecular representations from fingerprints |
title | Reconstruction of lossless molecular representations from fingerprints |
title_full | Reconstruction of lossless molecular representations from fingerprints |
title_fullStr | Reconstruction of lossless molecular representations from fingerprints |
title_full_unstemmed | Reconstruction of lossless molecular representations from fingerprints |
title_short | Reconstruction of lossless molecular representations from fingerprints |
title_sort | reconstruction of lossless molecular representations from fingerprints |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9948316/ https://www.ncbi.nlm.nih.gov/pubmed/36823647 http://dx.doi.org/10.1186/s13321-023-00693-0 |
work_keys_str_mv | AT ucakumitv reconstructionoflosslessmolecularrepresentationsfromfingerprints AT ashyrmamatovislambek reconstructionoflosslessmolecularrepresentationsfromfingerprints AT leejuyong reconstructionoflosslessmolecularrepresentationsfromfingerprints |