Cargando…

GlyLES: Grammar-based Parsing of Glycans from IUPAC-condensed to SMILES

Glycans are important polysaccharides on cellular surfaces that are bound to glycoproteins and glycolipids. These are one of the most common post-translational modifications of proteins in eukaryotic cells. They play important roles in protein folding, cell-cell interactions, and other extracellular...

Descripción completa

Detalles Bibliográficos
Autores principales: Joeres, Roman, Bojar, Daniel, Kalinina, Olga V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10035253/
https://www.ncbi.nlm.nih.gov/pubmed/36959676
http://dx.doi.org/10.1186/s13321-023-00704-0
_version_ 1784911380101464064
author Joeres, Roman
Bojar, Daniel
Kalinina, Olga V.
author_facet Joeres, Roman
Bojar, Daniel
Kalinina, Olga V.
author_sort Joeres, Roman
collection PubMed
description Glycans are important polysaccharides on cellular surfaces that are bound to glycoproteins and glycolipids. These are one of the most common post-translational modifications of proteins in eukaryotic cells. They play important roles in protein folding, cell-cell interactions, and other extracellular processes. Changes in glycan structures may influence the course of different diseases, such as infections or cancer. Glycans are commonly represented using the IUPAC-condensed notation. IUPAC-condensed is a textual representation of glycans operating on the same topological level as the Symbol Nomenclature for Glycans (SNFG) that assigns colored, geometrical shapes to the main monomers. These symbols are then connected in tree-like structures, visualizing the glycan structure on a topological level. Yet for a representation on the atomic level, notations such as SMILES should be used. To our knowledge, there is no easy-to-use, general, open-source, and offline tool to convert the IUPAC-condensed notation to SMILES. Here, we present the open-access Python package GlyLES for the generalizable generation of SMILES representations out of IUPAC-condensed representations. GlyLES uses a grammar to read in the monomer tree from the IUPAC-condensed notation. From this tree, the tool can compute the atomic structures of each monomer based on their IUPAC-condensed descriptions. In the last step, it merges all monomers into the atomic structure of a glycan in the SMILES notation. GlyLES is the first package that allows conversion from the IUPAC-condensed notation of glycans to SMILES strings. This may have multiple applications, including straightforward visualization, substructure search, molecular modeling and docking, and a new featurization strategy for machine-learning algorithms. GlyLES is available at https://github.com/kalininalab/GlyLES. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-023-00704-0.
format Online
Article
Text
id pubmed-10035253
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-100352532023-03-24 GlyLES: Grammar-based Parsing of Glycans from IUPAC-condensed to SMILES Joeres, Roman Bojar, Daniel Kalinina, Olga V. J Cheminform Software Glycans are important polysaccharides on cellular surfaces that are bound to glycoproteins and glycolipids. These are one of the most common post-translational modifications of proteins in eukaryotic cells. They play important roles in protein folding, cell-cell interactions, and other extracellular processes. Changes in glycan structures may influence the course of different diseases, such as infections or cancer. Glycans are commonly represented using the IUPAC-condensed notation. IUPAC-condensed is a textual representation of glycans operating on the same topological level as the Symbol Nomenclature for Glycans (SNFG) that assigns colored, geometrical shapes to the main monomers. These symbols are then connected in tree-like structures, visualizing the glycan structure on a topological level. Yet for a representation on the atomic level, notations such as SMILES should be used. To our knowledge, there is no easy-to-use, general, open-source, and offline tool to convert the IUPAC-condensed notation to SMILES. Here, we present the open-access Python package GlyLES for the generalizable generation of SMILES representations out of IUPAC-condensed representations. GlyLES uses a grammar to read in the monomer tree from the IUPAC-condensed notation. From this tree, the tool can compute the atomic structures of each monomer based on their IUPAC-condensed descriptions. In the last step, it merges all monomers into the atomic structure of a glycan in the SMILES notation. GlyLES is the first package that allows conversion from the IUPAC-condensed notation of glycans to SMILES strings. This may have multiple applications, including straightforward visualization, substructure search, molecular modeling and docking, and a new featurization strategy for machine-learning algorithms. GlyLES is available at https://github.com/kalininalab/GlyLES. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13321-023-00704-0. Springer International Publishing 2023-03-23 /pmc/articles/PMC10035253/ /pubmed/36959676 http://dx.doi.org/10.1186/s13321-023-00704-0 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Joeres, Roman
Bojar, Daniel
Kalinina, Olga V.
GlyLES: Grammar-based Parsing of Glycans from IUPAC-condensed to SMILES
title GlyLES: Grammar-based Parsing of Glycans from IUPAC-condensed to SMILES
title_full GlyLES: Grammar-based Parsing of Glycans from IUPAC-condensed to SMILES
title_fullStr GlyLES: Grammar-based Parsing of Glycans from IUPAC-condensed to SMILES
title_full_unstemmed GlyLES: Grammar-based Parsing of Glycans from IUPAC-condensed to SMILES
title_short GlyLES: Grammar-based Parsing of Glycans from IUPAC-condensed to SMILES
title_sort glyles: grammar-based parsing of glycans from iupac-condensed to smiles
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10035253/
https://www.ncbi.nlm.nih.gov/pubmed/36959676
http://dx.doi.org/10.1186/s13321-023-00704-0
work_keys_str_mv AT joeresroman glylesgrammarbasedparsingofglycansfromiupaccondensedtosmiles
AT bojardaniel glylesgrammarbasedparsingofglycansfromiupaccondensedtosmiles
AT kalininaolgav glylesgrammarbasedparsingofglycansfromiupaccondensedtosmiles