Cargando…

Identifying glycan motifs using a novel subtree mining approach

BACKGROUND: Glycans are complex sugar chains, crucial to many biological processes. By participating in binding interactions with proteins, glycans often play key roles in host–pathogen interactions. The specificities of glycan-binding proteins, such as lectins and antibodies, are governed by motifs...

Descripción completa

Detalles Bibliográficos
Autores principales: Coff, Lachlan, Chan, Jeffrey, Ramsland, Paul A., Guy, Andrew J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7001330/
https://www.ncbi.nlm.nih.gov/pubmed/32019496
http://dx.doi.org/10.1186/s12859-020-3374-4
_version_ 1783494221020266496
author Coff, Lachlan
Chan, Jeffrey
Ramsland, Paul A.
Guy, Andrew J.
author_facet Coff, Lachlan
Chan, Jeffrey
Ramsland, Paul A.
Guy, Andrew J.
author_sort Coff, Lachlan
collection PubMed
description BACKGROUND: Glycans are complex sugar chains, crucial to many biological processes. By participating in binding interactions with proteins, glycans often play key roles in host–pathogen interactions. The specificities of glycan-binding proteins, such as lectins and antibodies, are governed by motifs within larger glycan structures, and improved characterisations of these determinants would aid research into human diseases. Identification of motifs has previously been approached as a frequent subtree mining problem, and we extend these approaches with a glycan notation that allows recognition of terminal motifs. RESULTS: In this work, we customised a frequent subtree mining approach by altering the glycan notation to include information on terminal connections. This allows specific identification of terminal residues as potential motifs, better capturing the complexity of glycan-binding interactions. We achieved this by including additional nodes in a graph representation of the glycan structure to indicate the presence or absence of a linkage at particular backbone carbon positions. Combining this frequent subtree mining approach with a state-of-the-art feature selection algorithm termed minimum-redundancy, maximum-relevance (mRMR), we have generated a classification pipeline that is trained on data from a glycan microarray. When applied to a set of commonly used lectins, the identified motifs were consistent with known binding determinants. Furthermore, logistic regression classifiers trained using these motifs performed well across most lectins examined, with a median AUC value of 0.89. CONCLUSIONS: We present here a new subtree mining approach for the classification of glycan binding and identification of potential binding motifs. The Carbohydrate Classification Accounting for Restricted Linkages (CCARL) method will assist in the interpretation of glycan microarray experiments and will aid in the discovery of novel binding motifs for further experimental characterisation.
format Online
Article
Text
id pubmed-7001330
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-70013302020-02-10 Identifying glycan motifs using a novel subtree mining approach Coff, Lachlan Chan, Jeffrey Ramsland, Paul A. Guy, Andrew J. BMC Bioinformatics Methodology Article BACKGROUND: Glycans are complex sugar chains, crucial to many biological processes. By participating in binding interactions with proteins, glycans often play key roles in host–pathogen interactions. The specificities of glycan-binding proteins, such as lectins and antibodies, are governed by motifs within larger glycan structures, and improved characterisations of these determinants would aid research into human diseases. Identification of motifs has previously been approached as a frequent subtree mining problem, and we extend these approaches with a glycan notation that allows recognition of terminal motifs. RESULTS: In this work, we customised a frequent subtree mining approach by altering the glycan notation to include information on terminal connections. This allows specific identification of terminal residues as potential motifs, better capturing the complexity of glycan-binding interactions. We achieved this by including additional nodes in a graph representation of the glycan structure to indicate the presence or absence of a linkage at particular backbone carbon positions. Combining this frequent subtree mining approach with a state-of-the-art feature selection algorithm termed minimum-redundancy, maximum-relevance (mRMR), we have generated a classification pipeline that is trained on data from a glycan microarray. When applied to a set of commonly used lectins, the identified motifs were consistent with known binding determinants. Furthermore, logistic regression classifiers trained using these motifs performed well across most lectins examined, with a median AUC value of 0.89. CONCLUSIONS: We present here a new subtree mining approach for the classification of glycan binding and identification of potential binding motifs. The Carbohydrate Classification Accounting for Restricted Linkages (CCARL) method will assist in the interpretation of glycan microarray experiments and will aid in the discovery of novel binding motifs for further experimental characterisation. BioMed Central 2020-02-04 /pmc/articles/PMC7001330/ /pubmed/32019496 http://dx.doi.org/10.1186/s12859-020-3374-4 Text en © The Author(s) 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Coff, Lachlan
Chan, Jeffrey
Ramsland, Paul A.
Guy, Andrew J.
Identifying glycan motifs using a novel subtree mining approach
title Identifying glycan motifs using a novel subtree mining approach
title_full Identifying glycan motifs using a novel subtree mining approach
title_fullStr Identifying glycan motifs using a novel subtree mining approach
title_full_unstemmed Identifying glycan motifs using a novel subtree mining approach
title_short Identifying glycan motifs using a novel subtree mining approach
title_sort identifying glycan motifs using a novel subtree mining approach
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7001330/
https://www.ncbi.nlm.nih.gov/pubmed/32019496
http://dx.doi.org/10.1186/s12859-020-3374-4
work_keys_str_mv AT cofflachlan identifyingglycanmotifsusinganovelsubtreeminingapproach
AT chanjeffrey identifyingglycanmotifsusinganovelsubtreeminingapproach
AT ramslandpaula identifyingglycanmotifsusinganovelsubtreeminingapproach
AT guyandrewj identifyingglycanmotifsusinganovelsubtreeminingapproach