Cargando…
Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites
BACKGROUND: Glutarylation, the addition of a glutaryl group (five carbons) to a lysine residue of a protein molecule, is an important post-translational modification and plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified glutary...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394328/ https://www.ncbi.nlm.nih.gov/pubmed/30717647 http://dx.doi.org/10.1186/s12859-018-2394-9 |
_version_ | 1783565210308575232 |
---|---|
author | Huang, Kai-Yao Kao, Hui-Ju Hsu, Justin Bo-Kai Weng, Shun-Long Lee, Tzong-Yi |
author_facet | Huang, Kai-Yao Kao, Hui-Ju Hsu, Justin Bo-Kai Weng, Shun-Long Lee, Tzong-Yi |
author_sort | Huang, Kai-Yao |
collection | PubMed |
description | BACKGROUND: Glutarylation, the addition of a glutaryl group (five carbons) to a lysine residue of a protein molecule, is an important post-translational modification and plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified glutarylated peptides increases, it becomes imperative to investigate substrate motifs to enhance the study of protein glutarylation. We carried out a bioinformatics investigation of glutarylation sites based on amino acid composition using a public database containing information on 430 non-homologous glutarylation sites. RESULTS: The TwoSampleLogo analysis indicates that positively charged and polar amino acids surrounding glutarylated sites may be associated with the specificity in substrate site of protein glutarylation. Additionally, the chi-squared test was utilized to explore the intrinsic interdependence between two positions around glutarylation sites. Further, maximal dependence decomposition (MDD), which consists of partitioning a large-scale dataset into subgroups with statistically significant amino acid conservation, was used to capture motif signatures of glutarylation sites. We considered single features, such as amino acid composition (AAC), amino acid pair composition (AAPC), and composition of k-spaced amino acid pairs (CKSAAP), as well as the effectiveness of incorporating MDD-identified substrate motifs into an integrated prediction model. Evaluation by five-fold cross-validation showed that AAC was most effective in discriminating between glutarylation and non-glutarylation sites, according to support vector machine (SVM). CONCLUSIONS: The SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.677, a specificity of 0.619, an accuracy of 0.638, and a Matthews Correlation Coefficient (MCC) value of 0.28. Using an independent testing dataset (46 glutarylated and 92 non-glutarylated sites) obtained from the literature, we demonstrated that the integrated SVM model could improve the predictive performance effectively, yielding a balanced sensitivity and specificity of 0.652 and 0.739, respectively. This integrated SVM model has been implemented as a web-based system (MDDGlutar), which is now freely available at http://csb.cse.yzu.edu.tw/MDDGlutar/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2394-9) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-7394328 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-73943282020-08-05 Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites Huang, Kai-Yao Kao, Hui-Ju Hsu, Justin Bo-Kai Weng, Shun-Long Lee, Tzong-Yi BMC Bioinformatics Research BACKGROUND: Glutarylation, the addition of a glutaryl group (five carbons) to a lysine residue of a protein molecule, is an important post-translational modification and plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified glutarylated peptides increases, it becomes imperative to investigate substrate motifs to enhance the study of protein glutarylation. We carried out a bioinformatics investigation of glutarylation sites based on amino acid composition using a public database containing information on 430 non-homologous glutarylation sites. RESULTS: The TwoSampleLogo analysis indicates that positively charged and polar amino acids surrounding glutarylated sites may be associated with the specificity in substrate site of protein glutarylation. Additionally, the chi-squared test was utilized to explore the intrinsic interdependence between two positions around glutarylation sites. Further, maximal dependence decomposition (MDD), which consists of partitioning a large-scale dataset into subgroups with statistically significant amino acid conservation, was used to capture motif signatures of glutarylation sites. We considered single features, such as amino acid composition (AAC), amino acid pair composition (AAPC), and composition of k-spaced amino acid pairs (CKSAAP), as well as the effectiveness of incorporating MDD-identified substrate motifs into an integrated prediction model. Evaluation by five-fold cross-validation showed that AAC was most effective in discriminating between glutarylation and non-glutarylation sites, according to support vector machine (SVM). CONCLUSIONS: The SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.677, a specificity of 0.619, an accuracy of 0.638, and a Matthews Correlation Coefficient (MCC) value of 0.28. Using an independent testing dataset (46 glutarylated and 92 non-glutarylated sites) obtained from the literature, we demonstrated that the integrated SVM model could improve the predictive performance effectively, yielding a balanced sensitivity and specificity of 0.652 and 0.739, respectively. This integrated SVM model has been implemented as a web-based system (MDDGlutar), which is now freely available at http://csb.cse.yzu.edu.tw/MDDGlutar/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2394-9) contains supplementary material, which is available to authorized users. BioMed Central 2019-02-04 /pmc/articles/PMC7394328/ /pubmed/30717647 http://dx.doi.org/10.1186/s12859-018-2394-9 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Huang, Kai-Yao Kao, Hui-Ju Hsu, Justin Bo-Kai Weng, Shun-Long Lee, Tzong-Yi Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites |
title | Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites |
title_full | Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites |
title_fullStr | Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites |
title_full_unstemmed | Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites |
title_short | Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites |
title_sort | characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394328/ https://www.ncbi.nlm.nih.gov/pubmed/30717647 http://dx.doi.org/10.1186/s12859-018-2394-9 |
work_keys_str_mv | AT huangkaiyao characterizationandidentificationoflysineglutarylationbasedonintrinsicinterdependencebetweenpositionsinthesubstratesites AT kaohuiju characterizationandidentificationoflysineglutarylationbasedonintrinsicinterdependencebetweenpositionsinthesubstratesites AT hsujustinbokai characterizationandidentificationoflysineglutarylationbasedonintrinsicinterdependencebetweenpositionsinthesubstratesites AT wengshunlong characterizationandidentificationoflysineglutarylationbasedonintrinsicinterdependencebetweenpositionsinthesubstratesites AT leetzongyi characterizationandidentificationoflysineglutarylationbasedonintrinsicinterdependencebetweenpositionsinthesubstratesites |