Cargando…
Recent advances in the self-referencing embedded strings (SELFIES) library
String-based molecular representations play a crucial role in cheminformatics applications, and with the growing success of deep learning in chemistry, have been readily adopted into machine learning pipelines. However, traditional string-based representations such as SMILES are often prone to synta...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
RSC
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10408573/ https://www.ncbi.nlm.nih.gov/pubmed/38013816 http://dx.doi.org/10.1039/d3dd00044c |
_version_ | 1785086191241003008 |
---|---|
author | Lo, Alston Pollice, Robert Nigam, AkshatKumar White, Andrew D. Krenn, Mario Aspuru-Guzik, Alán |
author_facet | Lo, Alston Pollice, Robert Nigam, AkshatKumar White, Andrew D. Krenn, Mario Aspuru-Guzik, Alán |
author_sort | Lo, Alston |
collection | PubMed |
description | String-based molecular representations play a crucial role in cheminformatics applications, and with the growing success of deep learning in chemistry, have been readily adopted into machine learning pipelines. However, traditional string-based representations such as SMILES are often prone to syntactic and semantic errors when produced by generative models. To address these problems, a novel representation, SELF-referencing embedded strings (SELFIES), was proposed that is inherently 100% robust, alongside an accompanying open-source implementation called selfies. Since then, we have generalized SELFIES to support a wider range of molecules and semantic constraints, and streamlined its underlying grammar. We have implemented this updated representation in subsequent versions of selfies, where we have also made major advances with respect to design, efficiency, and supported features. Hence, we present the current status of selfies (version 2.1.1) in this manuscript. Our library, selfies, is available at GitHub (https://github.com/aspuru-guzik-group/selfies). |
format | Online Article Text |
id | pubmed-10408573 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | RSC |
record_format | MEDLINE/PubMed |
spelling | pubmed-104085732023-08-09 Recent advances in the self-referencing embedded strings (SELFIES) library Lo, Alston Pollice, Robert Nigam, AkshatKumar White, Andrew D. Krenn, Mario Aspuru-Guzik, Alán Digit Discov Chemistry String-based molecular representations play a crucial role in cheminformatics applications, and with the growing success of deep learning in chemistry, have been readily adopted into machine learning pipelines. However, traditional string-based representations such as SMILES are often prone to syntactic and semantic errors when produced by generative models. To address these problems, a novel representation, SELF-referencing embedded strings (SELFIES), was proposed that is inherently 100% robust, alongside an accompanying open-source implementation called selfies. Since then, we have generalized SELFIES to support a wider range of molecules and semantic constraints, and streamlined its underlying grammar. We have implemented this updated representation in subsequent versions of selfies, where we have also made major advances with respect to design, efficiency, and supported features. Hence, we present the current status of selfies (version 2.1.1) in this manuscript. Our library, selfies, is available at GitHub (https://github.com/aspuru-guzik-group/selfies). RSC 2023-07-01 /pmc/articles/PMC10408573/ /pubmed/38013816 http://dx.doi.org/10.1039/d3dd00044c Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by/3.0/ |
spellingShingle | Chemistry Lo, Alston Pollice, Robert Nigam, AkshatKumar White, Andrew D. Krenn, Mario Aspuru-Guzik, Alán Recent advances in the self-referencing embedded strings (SELFIES) library |
title | Recent advances in the self-referencing embedded strings (SELFIES) library |
title_full | Recent advances in the self-referencing embedded strings (SELFIES) library |
title_fullStr | Recent advances in the self-referencing embedded strings (SELFIES) library |
title_full_unstemmed | Recent advances in the self-referencing embedded strings (SELFIES) library |
title_short | Recent advances in the self-referencing embedded strings (SELFIES) library |
title_sort | recent advances in the self-referencing embedded strings (selfies) library |
topic | Chemistry |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10408573/ https://www.ncbi.nlm.nih.gov/pubmed/38013816 http://dx.doi.org/10.1039/d3dd00044c |
work_keys_str_mv | AT loalston recentadvancesintheselfreferencingembeddedstringsselfieslibrary AT pollicerobert recentadvancesintheselfreferencingembeddedstringsselfieslibrary AT nigamakshatkumar recentadvancesintheselfreferencingembeddedstringsselfieslibrary AT whiteandrewd recentadvancesintheselfreferencingembeddedstringsselfieslibrary AT krennmario recentadvancesintheselfreferencingembeddedstringsselfieslibrary AT aspuruguzikalan recentadvancesintheselfreferencingembeddedstringsselfieslibrary |