Cargando…

Recent advances in the self-referencing embedded strings (SELFIES) library

String-based molecular representations play a crucial role in cheminformatics applications, and with the growing success of deep learning in chemistry, have been readily adopted into machine learning pipelines. However, traditional string-based representations such as SMILES are often prone to synta...

Descripción completa

Detalles Bibliográficos
Autores principales: Lo, Alston, Pollice, Robert, Nigam, AkshatKumar, White, Andrew D., Krenn, Mario, Aspuru-Guzik, Alán
Formato: Online Artículo Texto
Lenguaje:English
Publicado: RSC 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10408573/
https://www.ncbi.nlm.nih.gov/pubmed/38013816
http://dx.doi.org/10.1039/d3dd00044c
_version_ 1785086191241003008
author Lo, Alston
Pollice, Robert
Nigam, AkshatKumar
White, Andrew D.
Krenn, Mario
Aspuru-Guzik, Alán
author_facet Lo, Alston
Pollice, Robert
Nigam, AkshatKumar
White, Andrew D.
Krenn, Mario
Aspuru-Guzik, Alán
author_sort Lo, Alston
collection PubMed
description String-based molecular representations play a crucial role in cheminformatics applications, and with the growing success of deep learning in chemistry, have been readily adopted into machine learning pipelines. However, traditional string-based representations such as SMILES are often prone to syntactic and semantic errors when produced by generative models. To address these problems, a novel representation, SELF-referencing embedded strings (SELFIES), was proposed that is inherently 100% robust, alongside an accompanying open-source implementation called selfies. Since then, we have generalized SELFIES to support a wider range of molecules and semantic constraints, and streamlined its underlying grammar. We have implemented this updated representation in subsequent versions of selfies, where we have also made major advances with respect to design, efficiency, and supported features. Hence, we present the current status of selfies (version 2.1.1) in this manuscript. Our library, selfies, is available at GitHub (https://github.com/aspuru-guzik-group/selfies).
format Online
Article
Text
id pubmed-10408573
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher RSC
record_format MEDLINE/PubMed
spelling pubmed-104085732023-08-09 Recent advances in the self-referencing embedded strings (SELFIES) library Lo, Alston Pollice, Robert Nigam, AkshatKumar White, Andrew D. Krenn, Mario Aspuru-Guzik, Alán Digit Discov Chemistry String-based molecular representations play a crucial role in cheminformatics applications, and with the growing success of deep learning in chemistry, have been readily adopted into machine learning pipelines. However, traditional string-based representations such as SMILES are often prone to syntactic and semantic errors when produced by generative models. To address these problems, a novel representation, SELF-referencing embedded strings (SELFIES), was proposed that is inherently 100% robust, alongside an accompanying open-source implementation called selfies. Since then, we have generalized SELFIES to support a wider range of molecules and semantic constraints, and streamlined its underlying grammar. We have implemented this updated representation in subsequent versions of selfies, where we have also made major advances with respect to design, efficiency, and supported features. Hence, we present the current status of selfies (version 2.1.1) in this manuscript. Our library, selfies, is available at GitHub (https://github.com/aspuru-guzik-group/selfies). RSC 2023-07-01 /pmc/articles/PMC10408573/ /pubmed/38013816 http://dx.doi.org/10.1039/d3dd00044c Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by/3.0/
spellingShingle Chemistry
Lo, Alston
Pollice, Robert
Nigam, AkshatKumar
White, Andrew D.
Krenn, Mario
Aspuru-Guzik, Alán
Recent advances in the self-referencing embedded strings (SELFIES) library
title Recent advances in the self-referencing embedded strings (SELFIES) library
title_full Recent advances in the self-referencing embedded strings (SELFIES) library
title_fullStr Recent advances in the self-referencing embedded strings (SELFIES) library
title_full_unstemmed Recent advances in the self-referencing embedded strings (SELFIES) library
title_short Recent advances in the self-referencing embedded strings (SELFIES) library
title_sort recent advances in the self-referencing embedded strings (selfies) library
topic Chemistry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10408573/
https://www.ncbi.nlm.nih.gov/pubmed/38013816
http://dx.doi.org/10.1039/d3dd00044c
work_keys_str_mv AT loalston recentadvancesintheselfreferencingembeddedstringsselfieslibrary
AT pollicerobert recentadvancesintheselfreferencingembeddedstringsselfieslibrary
AT nigamakshatkumar recentadvancesintheselfreferencingembeddedstringsselfieslibrary
AT whiteandrewd recentadvancesintheselfreferencingembeddedstringsselfieslibrary
AT krennmario recentadvancesintheselfreferencingembeddedstringsselfieslibrary
AT aspuruguzikalan recentadvancesintheselfreferencingembeddedstringsselfieslibrary