Cargando…

Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites

BACKGROUND: Glutarylation, the addition of a glutaryl group (five carbons) to a lysine residue of a protein molecule, is an important post-translational modification and plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified glutary...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Kai-Yao, Kao, Hui-Ju, Hsu, Justin Bo-Kai, Weng, Shun-Long, Lee, Tzong-Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394328/
https://www.ncbi.nlm.nih.gov/pubmed/30717647
http://dx.doi.org/10.1186/s12859-018-2394-9
_version_ 1783565210308575232
author Huang, Kai-Yao
Kao, Hui-Ju
Hsu, Justin Bo-Kai
Weng, Shun-Long
Lee, Tzong-Yi
author_facet Huang, Kai-Yao
Kao, Hui-Ju
Hsu, Justin Bo-Kai
Weng, Shun-Long
Lee, Tzong-Yi
author_sort Huang, Kai-Yao
collection PubMed
description BACKGROUND: Glutarylation, the addition of a glutaryl group (five carbons) to a lysine residue of a protein molecule, is an important post-translational modification and plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified glutarylated peptides increases, it becomes imperative to investigate substrate motifs to enhance the study of protein glutarylation. We carried out a bioinformatics investigation of glutarylation sites based on amino acid composition using a public database containing information on 430 non-homologous glutarylation sites. RESULTS: The TwoSampleLogo analysis indicates that positively charged and polar amino acids surrounding glutarylated sites may be associated with the specificity in substrate site of protein glutarylation. Additionally, the chi-squared test was utilized to explore the intrinsic interdependence between two positions around glutarylation sites. Further, maximal dependence decomposition (MDD), which consists of partitioning a large-scale dataset into subgroups with statistically significant amino acid conservation, was used to capture motif signatures of glutarylation sites. We considered single features, such as amino acid composition (AAC), amino acid pair composition (AAPC), and composition of k-spaced amino acid pairs (CKSAAP), as well as the effectiveness of incorporating MDD-identified substrate motifs into an integrated prediction model. Evaluation by five-fold cross-validation showed that AAC was most effective in discriminating between glutarylation and non-glutarylation sites, according to support vector machine (SVM). CONCLUSIONS: The SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.677, a specificity of 0.619, an accuracy of 0.638, and a Matthews Correlation Coefficient (MCC) value of 0.28. Using an independent testing dataset (46 glutarylated and 92 non-glutarylated sites) obtained from the literature, we demonstrated that the integrated SVM model could improve the predictive performance effectively, yielding a balanced sensitivity and specificity of 0.652 and 0.739, respectively. This integrated SVM model has been implemented as a web-based system (MDDGlutar), which is now freely available at http://csb.cse.yzu.edu.tw/MDDGlutar/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2394-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-7394328
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-73943282020-08-05 Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites Huang, Kai-Yao Kao, Hui-Ju Hsu, Justin Bo-Kai Weng, Shun-Long Lee, Tzong-Yi BMC Bioinformatics Research BACKGROUND: Glutarylation, the addition of a glutaryl group (five carbons) to a lysine residue of a protein molecule, is an important post-translational modification and plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified glutarylated peptides increases, it becomes imperative to investigate substrate motifs to enhance the study of protein glutarylation. We carried out a bioinformatics investigation of glutarylation sites based on amino acid composition using a public database containing information on 430 non-homologous glutarylation sites. RESULTS: The TwoSampleLogo analysis indicates that positively charged and polar amino acids surrounding glutarylated sites may be associated with the specificity in substrate site of protein glutarylation. Additionally, the chi-squared test was utilized to explore the intrinsic interdependence between two positions around glutarylation sites. Further, maximal dependence decomposition (MDD), which consists of partitioning a large-scale dataset into subgroups with statistically significant amino acid conservation, was used to capture motif signatures of glutarylation sites. We considered single features, such as amino acid composition (AAC), amino acid pair composition (AAPC), and composition of k-spaced amino acid pairs (CKSAAP), as well as the effectiveness of incorporating MDD-identified substrate motifs into an integrated prediction model. Evaluation by five-fold cross-validation showed that AAC was most effective in discriminating between glutarylation and non-glutarylation sites, according to support vector machine (SVM). CONCLUSIONS: The SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.677, a specificity of 0.619, an accuracy of 0.638, and a Matthews Correlation Coefficient (MCC) value of 0.28. Using an independent testing dataset (46 glutarylated and 92 non-glutarylated sites) obtained from the literature, we demonstrated that the integrated SVM model could improve the predictive performance effectively, yielding a balanced sensitivity and specificity of 0.652 and 0.739, respectively. This integrated SVM model has been implemented as a web-based system (MDDGlutar), which is now freely available at http://csb.cse.yzu.edu.tw/MDDGlutar/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2394-9) contains supplementary material, which is available to authorized users. BioMed Central 2019-02-04 /pmc/articles/PMC7394328/ /pubmed/30717647 http://dx.doi.org/10.1186/s12859-018-2394-9 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Huang, Kai-Yao
Kao, Hui-Ju
Hsu, Justin Bo-Kai
Weng, Shun-Long
Lee, Tzong-Yi
Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites
title Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites
title_full Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites
title_fullStr Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites
title_full_unstemmed Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites
title_short Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites
title_sort characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7394328/
https://www.ncbi.nlm.nih.gov/pubmed/30717647
http://dx.doi.org/10.1186/s12859-018-2394-9
work_keys_str_mv AT huangkaiyao characterizationandidentificationoflysineglutarylationbasedonintrinsicinterdependencebetweenpositionsinthesubstratesites
AT kaohuiju characterizationandidentificationoflysineglutarylationbasedonintrinsicinterdependencebetweenpositionsinthesubstratesites
AT hsujustinbokai characterizationandidentificationoflysineglutarylationbasedonintrinsicinterdependencebetweenpositionsinthesubstratesites
AT wengshunlong characterizationandidentificationoflysineglutarylationbasedonintrinsicinterdependencebetweenpositionsinthesubstratesites
AT leetzongyi characterizationandidentificationoflysineglutarylationbasedonintrinsicinterdependencebetweenpositionsinthesubstratesites