Cargando…
Ridge regression estimated linear probability model predictions of O-glycosylation in proteins with structural and sequence data
BACKGROUND: To-date, no claim regarding finding a consensus sequon for O-glycosylation has been made. Thus, predicting the likelihood of O-glycosylation with sequence and structural information using classical regression analysis is quite difficult. In particular, if a binary response is used to dis...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6599295/ https://www.ncbi.nlm.nih.gov/pubmed/31253080 http://dx.doi.org/10.1186/s12860-019-0200-9 |
_version_ | 1783430933478637568 |
---|---|
author | Gana, Rajaram Vasudevan, Sona |
author_facet | Gana, Rajaram Vasudevan, Sona |
author_sort | Gana, Rajaram |
collection | PubMed |
description | BACKGROUND: To-date, no claim regarding finding a consensus sequon for O-glycosylation has been made. Thus, predicting the likelihood of O-glycosylation with sequence and structural information using classical regression analysis is quite difficult. In particular, if a binary response is used to distinguish between O-glycosylated and non-O-glycosylated sequences, an appropriate set of non-O-glycosylatable sequences is hard to find. RESULTS: Three sequences from similar post-translational modifications (PTMs) of proteins occurring at, or very near, the S/T-site are analyzed: N-glycosylation, O-mucin type (O-GalNAc) glycosylation, and phosphorylation. Results found include: 1) The consensus composite sequon for O-glycosylation is: ~(W–S/T–W), where “~” denotes the “not” operator. 2) The consensus sequon for phosphorylation is ~(W–S/T/Y/H–W); although W–S/T/Y/H–W is not an absolute inhibitor of phosphorylation. 3) For linear probability model (LPM) estimation, N-glycosylated sequences are good approximations to non-O-glycosylatable sequences; although N – ~P – S/T is not an absolute inhibitor of O-glycosylation. 4) The selective positioning of an amino acid along the sequence, differentiates the PTMs of proteins. 5) Some N-glycosylated sequences are also phosphorylated at the S/T-site in the N – ~P – S/T sequon. 6) ASA values for N-glycosylated sequences are stochastically larger than those for O-GlcNAc glycosylated sequences. 7) Structural attributes (beta turn II, II´, helix, beta bridges, beta hairpin, and the phi angle) are significant LPM predictors of O-GlcNAc glycosylation. The LPM with sequence and structural data as explanatory variables yields a Kolmogorov-Smirnov (KS) statistic of 99%. 8) With only sequence data, the KS statistic erodes to 80%, and 21% of out-of-sample O-GlcNAc glycosylated sequences are mispredicted as not being glycosylated. The 95% confidence interval around this mispredictions rate is 16% to 26%. CONCLUSIONS: The data indicates the existence of a consensus sequon for O-glycosylation; and underscores the germaneness of structural information for predicting the likelihood of O-glycosylation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12860-019-0200-9) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6599295 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-65992952019-07-11 Ridge regression estimated linear probability model predictions of O-glycosylation in proteins with structural and sequence data Gana, Rajaram Vasudevan, Sona BMC Mol Cell Biol Research Article BACKGROUND: To-date, no claim regarding finding a consensus sequon for O-glycosylation has been made. Thus, predicting the likelihood of O-glycosylation with sequence and structural information using classical regression analysis is quite difficult. In particular, if a binary response is used to distinguish between O-glycosylated and non-O-glycosylated sequences, an appropriate set of non-O-glycosylatable sequences is hard to find. RESULTS: Three sequences from similar post-translational modifications (PTMs) of proteins occurring at, or very near, the S/T-site are analyzed: N-glycosylation, O-mucin type (O-GalNAc) glycosylation, and phosphorylation. Results found include: 1) The consensus composite sequon for O-glycosylation is: ~(W–S/T–W), where “~” denotes the “not” operator. 2) The consensus sequon for phosphorylation is ~(W–S/T/Y/H–W); although W–S/T/Y/H–W is not an absolute inhibitor of phosphorylation. 3) For linear probability model (LPM) estimation, N-glycosylated sequences are good approximations to non-O-glycosylatable sequences; although N – ~P – S/T is not an absolute inhibitor of O-glycosylation. 4) The selective positioning of an amino acid along the sequence, differentiates the PTMs of proteins. 5) Some N-glycosylated sequences are also phosphorylated at the S/T-site in the N – ~P – S/T sequon. 6) ASA values for N-glycosylated sequences are stochastically larger than those for O-GlcNAc glycosylated sequences. 7) Structural attributes (beta turn II, II´, helix, beta bridges, beta hairpin, and the phi angle) are significant LPM predictors of O-GlcNAc glycosylation. The LPM with sequence and structural data as explanatory variables yields a Kolmogorov-Smirnov (KS) statistic of 99%. 8) With only sequence data, the KS statistic erodes to 80%, and 21% of out-of-sample O-GlcNAc glycosylated sequences are mispredicted as not being glycosylated. The 95% confidence interval around this mispredictions rate is 16% to 26%. CONCLUSIONS: The data indicates the existence of a consensus sequon for O-glycosylation; and underscores the germaneness of structural information for predicting the likelihood of O-glycosylation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12860-019-0200-9) contains supplementary material, which is available to authorized users. BioMed Central 2019-06-28 /pmc/articles/PMC6599295/ /pubmed/31253080 http://dx.doi.org/10.1186/s12860-019-0200-9 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Gana, Rajaram Vasudevan, Sona Ridge regression estimated linear probability model predictions of O-glycosylation in proteins with structural and sequence data |
title | Ridge regression estimated linear probability model predictions of O-glycosylation in proteins with structural and sequence data |
title_full | Ridge regression estimated linear probability model predictions of O-glycosylation in proteins with structural and sequence data |
title_fullStr | Ridge regression estimated linear probability model predictions of O-glycosylation in proteins with structural and sequence data |
title_full_unstemmed | Ridge regression estimated linear probability model predictions of O-glycosylation in proteins with structural and sequence data |
title_short | Ridge regression estimated linear probability model predictions of O-glycosylation in proteins with structural and sequence data |
title_sort | ridge regression estimated linear probability model predictions of o-glycosylation in proteins with structural and sequence data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6599295/ https://www.ncbi.nlm.nih.gov/pubmed/31253080 http://dx.doi.org/10.1186/s12860-019-0200-9 |
work_keys_str_mv | AT ganarajaram ridgeregressionestimatedlinearprobabilitymodelpredictionsofoglycosylationinproteinswithstructuralandsequencedata AT vasudevansona ridgeregressionestimatedlinearprobabilitymodelpredictionsofoglycosylationinproteinswithstructuralandsequencedata |