Cargando…
N-GlyDE: a two-stage N-linked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding
N-linked glycosylation is one of the predominant post-translational modifications involved in a number of biological functions. Since experimental characterization of glycosites is challenging, glycosite prediction is crucial. Several predictors have been made available and report high performance....
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6828726/ https://www.ncbi.nlm.nih.gov/pubmed/31685900 http://dx.doi.org/10.1038/s41598-019-52341-z |
_version_ | 1783465413576753152 |
---|---|
author | Pitti, Thejkiran Chen, Ching-Tai Lin, Hsin-Nan Choong, Wai-Kok Hsu, Wen-Lian Sung, Ting-Yi |
author_facet | Pitti, Thejkiran Chen, Ching-Tai Lin, Hsin-Nan Choong, Wai-Kok Hsu, Wen-Lian Sung, Ting-Yi |
author_sort | Pitti, Thejkiran |
collection | PubMed |
description | N-linked glycosylation is one of the predominant post-translational modifications involved in a number of biological functions. Since experimental characterization of glycosites is challenging, glycosite prediction is crucial. Several predictors have been made available and report high performance. Most of them evaluate their performance at every asparagine in protein sequences, not confined to asparagine in the N-X-S/T sequon. In this paper, we present N-GlyDE, a two-stage prediction tool trained on rigorously-constructed non-redundant datasets to predict N-linked glycosites in the human proteome. The first stage uses a protein similarity voting algorithm trained on both glycoproteins and non-glycoproteins to predict a score for a protein to improve glycosite prediction. The second stage uses a support vector machine to predict N-linked glycosites by utilizing features of gapped dipeptides, pattern-based predicted surface accessibility, and predicted secondary structure. N-GlyDE’s final predictions are derived from a weight adjustment of the second-stage prediction results based on the first-stage prediction score. Evaluated on N-X-S/T sequons of an independent dataset comprised of 53 glycoproteins and 33 non-glycoproteins, N-GlyDE achieves an accuracy and MCC of 0.740 and 0.499, respectively, outperforming the compared tools. The N-GlyDE web server is available at http://bioapp.iis.sinica.edu.tw/N-GlyDE/. |
format | Online Article Text |
id | pubmed-6828726 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-68287262019-11-12 N-GlyDE: a two-stage N-linked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding Pitti, Thejkiran Chen, Ching-Tai Lin, Hsin-Nan Choong, Wai-Kok Hsu, Wen-Lian Sung, Ting-Yi Sci Rep Article N-linked glycosylation is one of the predominant post-translational modifications involved in a number of biological functions. Since experimental characterization of glycosites is challenging, glycosite prediction is crucial. Several predictors have been made available and report high performance. Most of them evaluate their performance at every asparagine in protein sequences, not confined to asparagine in the N-X-S/T sequon. In this paper, we present N-GlyDE, a two-stage prediction tool trained on rigorously-constructed non-redundant datasets to predict N-linked glycosites in the human proteome. The first stage uses a protein similarity voting algorithm trained on both glycoproteins and non-glycoproteins to predict a score for a protein to improve glycosite prediction. The second stage uses a support vector machine to predict N-linked glycosites by utilizing features of gapped dipeptides, pattern-based predicted surface accessibility, and predicted secondary structure. N-GlyDE’s final predictions are derived from a weight adjustment of the second-stage prediction results based on the first-stage prediction score. Evaluated on N-X-S/T sequons of an independent dataset comprised of 53 glycoproteins and 33 non-glycoproteins, N-GlyDE achieves an accuracy and MCC of 0.740 and 0.499, respectively, outperforming the compared tools. The N-GlyDE web server is available at http://bioapp.iis.sinica.edu.tw/N-GlyDE/. Nature Publishing Group UK 2019-11-04 /pmc/articles/PMC6828726/ /pubmed/31685900 http://dx.doi.org/10.1038/s41598-019-52341-z Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Pitti, Thejkiran Chen, Ching-Tai Lin, Hsin-Nan Choong, Wai-Kok Hsu, Wen-Lian Sung, Ting-Yi N-GlyDE: a two-stage N-linked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding |
title | N-GlyDE: a two-stage N-linked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding |
title_full | N-GlyDE: a two-stage N-linked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding |
title_fullStr | N-GlyDE: a two-stage N-linked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding |
title_full_unstemmed | N-GlyDE: a two-stage N-linked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding |
title_short | N-GlyDE: a two-stage N-linked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding |
title_sort | n-glyde: a two-stage n-linked glycosylation site prediction incorporating gapped dipeptides and pattern-based encoding |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6828726/ https://www.ncbi.nlm.nih.gov/pubmed/31685900 http://dx.doi.org/10.1038/s41598-019-52341-z |
work_keys_str_mv | AT pittithejkiran nglydeatwostagenlinkedglycosylationsitepredictionincorporatinggappeddipeptidesandpatternbasedencoding AT chenchingtai nglydeatwostagenlinkedglycosylationsitepredictionincorporatinggappeddipeptidesandpatternbasedencoding AT linhsinnan nglydeatwostagenlinkedglycosylationsitepredictionincorporatinggappeddipeptidesandpatternbasedencoding AT choongwaikok nglydeatwostagenlinkedglycosylationsitepredictionincorporatinggappeddipeptidesandpatternbasedencoding AT hsuwenlian nglydeatwostagenlinkedglycosylationsitepredictionincorporatinggappeddipeptidesandpatternbasedencoding AT sungtingyi nglydeatwostagenlinkedglycosylationsitepredictionincorporatinggappeddipeptidesandpatternbasedencoding |