Cargando…

Mapping the glycosyltransferase fold landscape using interpretable deep learning

Glycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mappin...

Descripción completa

Detalles Bibliográficos
Autores principales: Taujale, Rahil, Zhou, Zhongliang, Yeung, Wayland, Moremen, Kelley W., Li, Sheng, Kannan, Natarajan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8476585/
https://www.ncbi.nlm.nih.gov/pubmed/34580305
http://dx.doi.org/10.1038/s41467-021-25975-9
_version_ 1784575649876279296
author Taujale, Rahil
Zhou, Zhongliang
Yeung, Wayland
Moremen, Kelley W.
Li, Sheng
Kannan, Natarajan
author_facet Taujale, Rahil
Zhou, Zhongliang
Yeung, Wayland
Moremen, Kelley W.
Li, Sheng
Kannan, Natarajan
author_sort Taujale, Rahil
collection PubMed
description Glycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mapping the relationships connecting sequence, structure, fold and function using traditional bioinformatics approaches. Here, we present a convolutional neural network with attention (CNN-attention) based deep learning model that leverages simple secondary structure representations generated from primary sequences to provide GT fold prediction with high accuracy. The model learns distinguishing secondary structure features free of primary sequence alignment constraints and is highly interpretable. It delineates sequence and structural features characteristic of individual fold types, while classifying them into distinct clusters that group evolutionarily divergent families based on shared secondary structural features. We further extend our model to classify GT families of unknown folds and variants of known folds. By identifying families that are likely to adopt novel folds such as GT91, GT96 and GT97, our studies expand the GT fold landscape and prioritize targets for future structural studies.
format Online
Article
Text
id pubmed-8476585
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-84765852021-10-22 Mapping the glycosyltransferase fold landscape using interpretable deep learning Taujale, Rahil Zhou, Zhongliang Yeung, Wayland Moremen, Kelley W. Li, Sheng Kannan, Natarajan Nat Commun Article Glycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mapping the relationships connecting sequence, structure, fold and function using traditional bioinformatics approaches. Here, we present a convolutional neural network with attention (CNN-attention) based deep learning model that leverages simple secondary structure representations generated from primary sequences to provide GT fold prediction with high accuracy. The model learns distinguishing secondary structure features free of primary sequence alignment constraints and is highly interpretable. It delineates sequence and structural features characteristic of individual fold types, while classifying them into distinct clusters that group evolutionarily divergent families based on shared secondary structural features. We further extend our model to classify GT families of unknown folds and variants of known folds. By identifying families that are likely to adopt novel folds such as GT91, GT96 and GT97, our studies expand the GT fold landscape and prioritize targets for future structural studies. Nature Publishing Group UK 2021-09-27 /pmc/articles/PMC8476585/ /pubmed/34580305 http://dx.doi.org/10.1038/s41467-021-25975-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Taujale, Rahil
Zhou, Zhongliang
Yeung, Wayland
Moremen, Kelley W.
Li, Sheng
Kannan, Natarajan
Mapping the glycosyltransferase fold landscape using interpretable deep learning
title Mapping the glycosyltransferase fold landscape using interpretable deep learning
title_full Mapping the glycosyltransferase fold landscape using interpretable deep learning
title_fullStr Mapping the glycosyltransferase fold landscape using interpretable deep learning
title_full_unstemmed Mapping the glycosyltransferase fold landscape using interpretable deep learning
title_short Mapping the glycosyltransferase fold landscape using interpretable deep learning
title_sort mapping the glycosyltransferase fold landscape using interpretable deep learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8476585/
https://www.ncbi.nlm.nih.gov/pubmed/34580305
http://dx.doi.org/10.1038/s41467-021-25975-9
work_keys_str_mv AT taujalerahil mappingtheglycosyltransferasefoldlandscapeusinginterpretabledeeplearning
AT zhouzhongliang mappingtheglycosyltransferasefoldlandscapeusinginterpretabledeeplearning
AT yeungwayland mappingtheglycosyltransferasefoldlandscapeusinginterpretabledeeplearning
AT moremenkelleyw mappingtheglycosyltransferasefoldlandscapeusinginterpretabledeeplearning
AT lisheng mappingtheglycosyltransferasefoldlandscapeusinginterpretabledeeplearning
AT kannannatarajan mappingtheglycosyltransferasefoldlandscapeusinginterpretabledeeplearning