Cargando…
Mapping the glycosyltransferase fold landscape using interpretable deep learning
Glycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mappin...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8476585/ https://www.ncbi.nlm.nih.gov/pubmed/34580305 http://dx.doi.org/10.1038/s41467-021-25975-9 |
_version_ | 1784575649876279296 |
---|---|
author | Taujale, Rahil Zhou, Zhongliang Yeung, Wayland Moremen, Kelley W. Li, Sheng Kannan, Natarajan |
author_facet | Taujale, Rahil Zhou, Zhongliang Yeung, Wayland Moremen, Kelley W. Li, Sheng Kannan, Natarajan |
author_sort | Taujale, Rahil |
collection | PubMed |
description | Glycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mapping the relationships connecting sequence, structure, fold and function using traditional bioinformatics approaches. Here, we present a convolutional neural network with attention (CNN-attention) based deep learning model that leverages simple secondary structure representations generated from primary sequences to provide GT fold prediction with high accuracy. The model learns distinguishing secondary structure features free of primary sequence alignment constraints and is highly interpretable. It delineates sequence and structural features characteristic of individual fold types, while classifying them into distinct clusters that group evolutionarily divergent families based on shared secondary structural features. We further extend our model to classify GT families of unknown folds and variants of known folds. By identifying families that are likely to adopt novel folds such as GT91, GT96 and GT97, our studies expand the GT fold landscape and prioritize targets for future structural studies. |
format | Online Article Text |
id | pubmed-8476585 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-84765852021-10-22 Mapping the glycosyltransferase fold landscape using interpretable deep learning Taujale, Rahil Zhou, Zhongliang Yeung, Wayland Moremen, Kelley W. Li, Sheng Kannan, Natarajan Nat Commun Article Glycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mapping the relationships connecting sequence, structure, fold and function using traditional bioinformatics approaches. Here, we present a convolutional neural network with attention (CNN-attention) based deep learning model that leverages simple secondary structure representations generated from primary sequences to provide GT fold prediction with high accuracy. The model learns distinguishing secondary structure features free of primary sequence alignment constraints and is highly interpretable. It delineates sequence and structural features characteristic of individual fold types, while classifying them into distinct clusters that group evolutionarily divergent families based on shared secondary structural features. We further extend our model to classify GT families of unknown folds and variants of known folds. By identifying families that are likely to adopt novel folds such as GT91, GT96 and GT97, our studies expand the GT fold landscape and prioritize targets for future structural studies. Nature Publishing Group UK 2021-09-27 /pmc/articles/PMC8476585/ /pubmed/34580305 http://dx.doi.org/10.1038/s41467-021-25975-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Taujale, Rahil Zhou, Zhongliang Yeung, Wayland Moremen, Kelley W. Li, Sheng Kannan, Natarajan Mapping the glycosyltransferase fold landscape using interpretable deep learning |
title | Mapping the glycosyltransferase fold landscape using interpretable deep learning |
title_full | Mapping the glycosyltransferase fold landscape using interpretable deep learning |
title_fullStr | Mapping the glycosyltransferase fold landscape using interpretable deep learning |
title_full_unstemmed | Mapping the glycosyltransferase fold landscape using interpretable deep learning |
title_short | Mapping the glycosyltransferase fold landscape using interpretable deep learning |
title_sort | mapping the glycosyltransferase fold landscape using interpretable deep learning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8476585/ https://www.ncbi.nlm.nih.gov/pubmed/34580305 http://dx.doi.org/10.1038/s41467-021-25975-9 |
work_keys_str_mv | AT taujalerahil mappingtheglycosyltransferasefoldlandscapeusinginterpretabledeeplearning AT zhouzhongliang mappingtheglycosyltransferasefoldlandscapeusinginterpretabledeeplearning AT yeungwayland mappingtheglycosyltransferasefoldlandscapeusinginterpretabledeeplearning AT moremenkelleyw mappingtheglycosyltransferasefoldlandscapeusinginterpretabledeeplearning AT lisheng mappingtheglycosyltransferasefoldlandscapeusinginterpretabledeeplearning AT kannannatarajan mappingtheglycosyltransferasefoldlandscapeusinginterpretabledeeplearning |