Cargando…

Canonicalizing BigSMILES for Polymers with Defined Backbones

[Image: see text] BigSMILES, a line notation for encapsulating the molecular structure of stochastic molecules such as polymers, was recently proposed as a compact and readable solution for writing macromolecules. While BigSMILES strings serve as useful identifiers for reconstructing the molecular c...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Tzyy-Shyang, Rebello, Nathan J., Lee, Guang-He, Morris, Melody A., Olsen, Bradley D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2022
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9761857/
https://www.ncbi.nlm.nih.gov/pubmed/36561286
http://dx.doi.org/10.1021/acspolymersau.2c00009
_version_ 1784852756932067328
author Lin, Tzyy-Shyang
Rebello, Nathan J.
Lee, Guang-He
Morris, Melody A.
Olsen, Bradley D.
author_facet Lin, Tzyy-Shyang
Rebello, Nathan J.
Lee, Guang-He
Morris, Melody A.
Olsen, Bradley D.
author_sort Lin, Tzyy-Shyang
collection PubMed
description [Image: see text] BigSMILES, a line notation for encapsulating the molecular structure of stochastic molecules such as polymers, was recently proposed as a compact and readable solution for writing macromolecules. While BigSMILES strings serve as useful identifiers for reconstructing the molecular connectivity for polymers, in general, BigSMILES allows the same polymer to be codified into multiple equally valid representations. Having a canonicalization scheme that eliminates the multiplicity would be very useful in reducing time-intensive tasks like structural comparison and molecular search into simple string-matching tasks. Motivated by this, in this work, two strategies for deriving canonical representations for linear polymers are proposed. In the first approach, a canonicalization scheme is proposed to standardize the expression of BigSMILES stochastic objects, thereby standardizing the expression of overall BigSMILES strings. In the second approach, an analogy between formal language theory and the molecular ensemble of polymer molecules is drawn. Linear polymers can be converted into regular languages, and the minimal deterministic finite automaton uniquely associated with each prescribed language is used as the basis for constructing the unique text identifier associated with each distinct polymer. Overall, this work presents algorithms to convert linear polymers into unique structure-based text identifiers. The derived identifiers can be readily applied in chemical information systems for polymers and other polymer informatics applications.
format Online
Article
Text
id pubmed-9761857
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-97618572022-12-20 Canonicalizing BigSMILES for Polymers with Defined Backbones Lin, Tzyy-Shyang Rebello, Nathan J. Lee, Guang-He Morris, Melody A. Olsen, Bradley D. ACS Polym Au [Image: see text] BigSMILES, a line notation for encapsulating the molecular structure of stochastic molecules such as polymers, was recently proposed as a compact and readable solution for writing macromolecules. While BigSMILES strings serve as useful identifiers for reconstructing the molecular connectivity for polymers, in general, BigSMILES allows the same polymer to be codified into multiple equally valid representations. Having a canonicalization scheme that eliminates the multiplicity would be very useful in reducing time-intensive tasks like structural comparison and molecular search into simple string-matching tasks. Motivated by this, in this work, two strategies for deriving canonical representations for linear polymers are proposed. In the first approach, a canonicalization scheme is proposed to standardize the expression of BigSMILES stochastic objects, thereby standardizing the expression of overall BigSMILES strings. In the second approach, an analogy between formal language theory and the molecular ensemble of polymer molecules is drawn. Linear polymers can be converted into regular languages, and the minimal deterministic finite automaton uniquely associated with each prescribed language is used as the basis for constructing the unique text identifier associated with each distinct polymer. Overall, this work presents algorithms to convert linear polymers into unique structure-based text identifiers. The derived identifiers can be readily applied in chemical information systems for polymers and other polymer informatics applications. American Chemical Society 2022-10-14 /pmc/articles/PMC9761857/ /pubmed/36561286 http://dx.doi.org/10.1021/acspolymersau.2c00009 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Lin, Tzyy-Shyang
Rebello, Nathan J.
Lee, Guang-He
Morris, Melody A.
Olsen, Bradley D.
Canonicalizing BigSMILES for Polymers with Defined Backbones
title Canonicalizing BigSMILES for Polymers with Defined Backbones
title_full Canonicalizing BigSMILES for Polymers with Defined Backbones
title_fullStr Canonicalizing BigSMILES for Polymers with Defined Backbones
title_full_unstemmed Canonicalizing BigSMILES for Polymers with Defined Backbones
title_short Canonicalizing BigSMILES for Polymers with Defined Backbones
title_sort canonicalizing bigsmiles for polymers with defined backbones
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9761857/
https://www.ncbi.nlm.nih.gov/pubmed/36561286
http://dx.doi.org/10.1021/acspolymersau.2c00009
work_keys_str_mv AT lintzyyshyang canonicalizingbigsmilesforpolymerswithdefinedbackbones
AT rebellonathanj canonicalizingbigsmilesforpolymerswithdefinedbackbones
AT leeguanghe canonicalizingbigsmilesforpolymerswithdefinedbackbones
AT morrismelodya canonicalizingbigsmilesforpolymerswithdefinedbackbones
AT olsenbradleyd canonicalizingbigsmilesforpolymerswithdefinedbackbones