Cargando…

EMNGly: predicting N-linked glycosylation sites using the language models for feature extraction

MOTIVATION: N-linked glycosylation is a frequently occurring post-translational protein modification that serves critical functions in protein folding, stability, trafficking, and recognition. Its involvement spans across multiple biological processes and alterations to this process can result in va...

Descripción completa

Detalles Bibliográficos
Autores principales: Hou, Xiaoyang, Wang, Yu, Bu, Dongbo, Wang, Yaojun, Sun, Shiwei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627407/
https://www.ncbi.nlm.nih.gov/pubmed/37930896
http://dx.doi.org/10.1093/bioinformatics/btad650
_version_ 1785131521845231616
author Hou, Xiaoyang
Wang, Yu
Bu, Dongbo
Wang, Yaojun
Sun, Shiwei
author_facet Hou, Xiaoyang
Wang, Yu
Bu, Dongbo
Wang, Yaojun
Sun, Shiwei
author_sort Hou, Xiaoyang
collection PubMed
description MOTIVATION: N-linked glycosylation is a frequently occurring post-translational protein modification that serves critical functions in protein folding, stability, trafficking, and recognition. Its involvement spans across multiple biological processes and alterations to this process can result in various diseases. Therefore, identifying N-linked glycosylation sites is imperative for comprehending the mechanisms and systems underlying glycosylation. Due to the inherent experimental complexities, machine learning and deep learning have become indispensable tools for predicting these sites. RESULTS: In this context, a new approach called EMNGly has been proposed. The EMNGly approach utilizes pretrained protein language model (Evolutionary Scale Modeling) and pretrained protein structure model (Inverse Folding Model) for features extraction and support vector machine for classification. Ten-fold cross-validation and independent tests show that this approach has outperformed existing techniques. And it achieves Matthews Correlation Coefficient, sensitivity, specificity, and accuracy of 0.8282, 0.9343, 0.8934, and 0.9143, respectively on a benchmark independent test set.
format Online
Article
Text
id pubmed-10627407
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106274072023-11-07 EMNGly: predicting N-linked glycosylation sites using the language models for feature extraction Hou, Xiaoyang Wang, Yu Bu, Dongbo Wang, Yaojun Sun, Shiwei Bioinformatics Original Paper MOTIVATION: N-linked glycosylation is a frequently occurring post-translational protein modification that serves critical functions in protein folding, stability, trafficking, and recognition. Its involvement spans across multiple biological processes and alterations to this process can result in various diseases. Therefore, identifying N-linked glycosylation sites is imperative for comprehending the mechanisms and systems underlying glycosylation. Due to the inherent experimental complexities, machine learning and deep learning have become indispensable tools for predicting these sites. RESULTS: In this context, a new approach called EMNGly has been proposed. The EMNGly approach utilizes pretrained protein language model (Evolutionary Scale Modeling) and pretrained protein structure model (Inverse Folding Model) for features extraction and support vector machine for classification. Ten-fold cross-validation and independent tests show that this approach has outperformed existing techniques. And it achieves Matthews Correlation Coefficient, sensitivity, specificity, and accuracy of 0.8282, 0.9343, 0.8934, and 0.9143, respectively on a benchmark independent test set. Oxford University Press 2023-11-01 /pmc/articles/PMC10627407/ /pubmed/37930896 http://dx.doi.org/10.1093/bioinformatics/btad650 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Hou, Xiaoyang
Wang, Yu
Bu, Dongbo
Wang, Yaojun
Sun, Shiwei
EMNGly: predicting N-linked glycosylation sites using the language models for feature extraction
title EMNGly: predicting N-linked glycosylation sites using the language models for feature extraction
title_full EMNGly: predicting N-linked glycosylation sites using the language models for feature extraction
title_fullStr EMNGly: predicting N-linked glycosylation sites using the language models for feature extraction
title_full_unstemmed EMNGly: predicting N-linked glycosylation sites using the language models for feature extraction
title_short EMNGly: predicting N-linked glycosylation sites using the language models for feature extraction
title_sort emngly: predicting n-linked glycosylation sites using the language models for feature extraction
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627407/
https://www.ncbi.nlm.nih.gov/pubmed/37930896
http://dx.doi.org/10.1093/bioinformatics/btad650
work_keys_str_mv AT houxiaoyang emnglypredictingnlinkedglycosylationsitesusingthelanguagemodelsforfeatureextraction
AT wangyu emnglypredictingnlinkedglycosylationsitesusingthelanguagemodelsforfeatureextraction
AT budongbo emnglypredictingnlinkedglycosylationsitesusingthelanguagemodelsforfeatureextraction
AT wangyaojun emnglypredictingnlinkedglycosylationsitesusingthelanguagemodelsforfeatureextraction
AT sunshiwei emnglypredictingnlinkedglycosylationsitesusingthelanguagemodelsforfeatureextraction