Cargando…

SELFIES and the future of molecular string representations

Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs...

Descripción completa

Detalles Bibliográficos
Autores principales: Krenn, Mario, Ai, Qianxiang, Barthel, Senja, Carson, Nessa, Frei, Angelo, Frey, Nathan C., Friederich, Pascal, Gaudin, Théophile, Gayle, Alberto Alexander, Jablonka, Kevin Maik, Lameiro, Rafael F., Lemm, Dominik, Lo, Alston, Moosavi, Seyed Mohamad, Nápoles-Duarte, José Manuel, Nigam, AkshatKumar, Pollice, Robert, Rajan, Kohulan, Schatzschneider, Ulrich, Schwaller, Philippe, Skreta, Marta, Smit, Berend, Strieth-Kalthoff, Felix, Sun, Chong, Tom, Gary, Falk von Rudorff, Guido, Wang, Andrew, White, Andrew D., Young, Adamo, Yu, Rose, Aspuru-Guzik, Alán
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9583042/
https://www.ncbi.nlm.nih.gov/pubmed/36277819
http://dx.doi.org/10.1016/j.patter.2022.100588
_version_ 1784812980541587456
author Krenn, Mario
Ai, Qianxiang
Barthel, Senja
Carson, Nessa
Frei, Angelo
Frey, Nathan C.
Friederich, Pascal
Gaudin, Théophile
Gayle, Alberto Alexander
Jablonka, Kevin Maik
Lameiro, Rafael F.
Lemm, Dominik
Lo, Alston
Moosavi, Seyed Mohamad
Nápoles-Duarte, José Manuel
Nigam, AkshatKumar
Pollice, Robert
Rajan, Kohulan
Schatzschneider, Ulrich
Schwaller, Philippe
Skreta, Marta
Smit, Berend
Strieth-Kalthoff, Felix
Sun, Chong
Tom, Gary
Falk von Rudorff, Guido
Wang, Andrew
White, Andrew D.
Young, Adamo
Yu, Rose
Aspuru-Guzik, Alán
author_facet Krenn, Mario
Ai, Qianxiang
Barthel, Senja
Carson, Nessa
Frei, Angelo
Frey, Nathan C.
Friederich, Pascal
Gaudin, Théophile
Gayle, Alberto Alexander
Jablonka, Kevin Maik
Lameiro, Rafael F.
Lemm, Dominik
Lo, Alston
Moosavi, Seyed Mohamad
Nápoles-Duarte, José Manuel
Nigam, AkshatKumar
Pollice, Robert
Rajan, Kohulan
Schatzschneider, Ulrich
Schwaller, Philippe
Skreta, Marta
Smit, Berend
Strieth-Kalthoff, Felix
Sun, Chong
Tom, Gary
Falk von Rudorff, Guido
Wang, Andrew
White, Andrew D.
Young, Adamo
Yu, Rose
Aspuru-Guzik, Alán
author_sort Krenn, Mario
collection PubMed
description Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings—most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages, and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science.
format Online
Article
Text
id pubmed-9583042
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-95830422022-10-21 SELFIES and the future of molecular string representations Krenn, Mario Ai, Qianxiang Barthel, Senja Carson, Nessa Frei, Angelo Frey, Nathan C. Friederich, Pascal Gaudin, Théophile Gayle, Alberto Alexander Jablonka, Kevin Maik Lameiro, Rafael F. Lemm, Dominik Lo, Alston Moosavi, Seyed Mohamad Nápoles-Duarte, José Manuel Nigam, AkshatKumar Pollice, Robert Rajan, Kohulan Schatzschneider, Ulrich Schwaller, Philippe Skreta, Marta Smit, Berend Strieth-Kalthoff, Felix Sun, Chong Tom, Gary Falk von Rudorff, Guido Wang, Andrew White, Andrew D. Young, Adamo Yu, Rose Aspuru-Guzik, Alán Patterns (N Y) Perspective Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings—most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages, and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science. Elsevier 2022-10-14 /pmc/articles/PMC9583042/ /pubmed/36277819 http://dx.doi.org/10.1016/j.patter.2022.100588 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Perspective
Krenn, Mario
Ai, Qianxiang
Barthel, Senja
Carson, Nessa
Frei, Angelo
Frey, Nathan C.
Friederich, Pascal
Gaudin, Théophile
Gayle, Alberto Alexander
Jablonka, Kevin Maik
Lameiro, Rafael F.
Lemm, Dominik
Lo, Alston
Moosavi, Seyed Mohamad
Nápoles-Duarte, José Manuel
Nigam, AkshatKumar
Pollice, Robert
Rajan, Kohulan
Schatzschneider, Ulrich
Schwaller, Philippe
Skreta, Marta
Smit, Berend
Strieth-Kalthoff, Felix
Sun, Chong
Tom, Gary
Falk von Rudorff, Guido
Wang, Andrew
White, Andrew D.
Young, Adamo
Yu, Rose
Aspuru-Guzik, Alán
SELFIES and the future of molecular string representations
title SELFIES and the future of molecular string representations
title_full SELFIES and the future of molecular string representations
title_fullStr SELFIES and the future of molecular string representations
title_full_unstemmed SELFIES and the future of molecular string representations
title_short SELFIES and the future of molecular string representations
title_sort selfies and the future of molecular string representations
topic Perspective
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9583042/
https://www.ncbi.nlm.nih.gov/pubmed/36277819
http://dx.doi.org/10.1016/j.patter.2022.100588
work_keys_str_mv AT krennmario selfiesandthefutureofmolecularstringrepresentations
AT aiqianxiang selfiesandthefutureofmolecularstringrepresentations
AT barthelsenja selfiesandthefutureofmolecularstringrepresentations
AT carsonnessa selfiesandthefutureofmolecularstringrepresentations
AT freiangelo selfiesandthefutureofmolecularstringrepresentations
AT freynathanc selfiesandthefutureofmolecularstringrepresentations
AT friederichpascal selfiesandthefutureofmolecularstringrepresentations
AT gaudintheophile selfiesandthefutureofmolecularstringrepresentations
AT gaylealbertoalexander selfiesandthefutureofmolecularstringrepresentations
AT jablonkakevinmaik selfiesandthefutureofmolecularstringrepresentations
AT lameirorafaelf selfiesandthefutureofmolecularstringrepresentations
AT lemmdominik selfiesandthefutureofmolecularstringrepresentations
AT loalston selfiesandthefutureofmolecularstringrepresentations
AT moosaviseyedmohamad selfiesandthefutureofmolecularstringrepresentations
AT napolesduartejosemanuel selfiesandthefutureofmolecularstringrepresentations
AT nigamakshatkumar selfiesandthefutureofmolecularstringrepresentations
AT pollicerobert selfiesandthefutureofmolecularstringrepresentations
AT rajankohulan selfiesandthefutureofmolecularstringrepresentations
AT schatzschneiderulrich selfiesandthefutureofmolecularstringrepresentations
AT schwallerphilippe selfiesandthefutureofmolecularstringrepresentations
AT skretamarta selfiesandthefutureofmolecularstringrepresentations
AT smitberend selfiesandthefutureofmolecularstringrepresentations
AT striethkalthofffelix selfiesandthefutureofmolecularstringrepresentations
AT sunchong selfiesandthefutureofmolecularstringrepresentations
AT tomgary selfiesandthefutureofmolecularstringrepresentations
AT falkvonrudorffguido selfiesandthefutureofmolecularstringrepresentations
AT wangandrew selfiesandthefutureofmolecularstringrepresentations
AT whiteandrewd selfiesandthefutureofmolecularstringrepresentations
AT youngadamo selfiesandthefutureofmolecularstringrepresentations
AT yurose selfiesandthefutureofmolecularstringrepresentations
AT aspuruguzikalan selfiesandthefutureofmolecularstringrepresentations