Cargando…

Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features

BACKGROUND: Protein carbonylation, an irreversible and non-enzymatic post-translational modification (PTM), is often used as a marker of oxidative stress. When reactive oxygen species (ROS) oxidized the amino acid side chains, carbonyl (CO) groups are produced especially on Lysine (K), Arginine (R),...

Descripción completa

Detalles Bibliográficos
Autores principales: Weng, Shun-Long, Huang, Kai-Yao, Kaunang, Fergie Joanda, Huang, Chien-Hsun, Kao, Hui-Ju, Chang, Tzu-Hao, Wang, Hsin-Yao, Lu, Jang-Jih, Lee, Tzong-Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374553/
https://www.ncbi.nlm.nih.gov/pubmed/28361707
http://dx.doi.org/10.1186/s12859-017-1472-8
_version_ 1782518910719885312
author Weng, Shun-Long
Huang, Kai-Yao
Kaunang, Fergie Joanda
Huang, Chien-Hsun
Kao, Hui-Ju
Chang, Tzu-Hao
Wang, Hsin-Yao
Lu, Jang-Jih
Lee, Tzong-Yi
author_facet Weng, Shun-Long
Huang, Kai-Yao
Kaunang, Fergie Joanda
Huang, Chien-Hsun
Kao, Hui-Ju
Chang, Tzu-Hao
Wang, Hsin-Yao
Lu, Jang-Jih
Lee, Tzong-Yi
author_sort Weng, Shun-Long
collection PubMed
description BACKGROUND: Protein carbonylation, an irreversible and non-enzymatic post-translational modification (PTM), is often used as a marker of oxidative stress. When reactive oxygen species (ROS) oxidized the amino acid side chains, carbonyl (CO) groups are produced especially on Lysine (K), Arginine (R), Threonine (T), and Proline (P). Nevertheless, due to the lack of information about the carbonylated substrate specificity, we were encouraged to develop a systematic method for a comprehensive investigation of protein carbonylation sites. RESULTS: After the removal of redundant data from multipe carbonylation-related articles, totally 226 carbonylated proteins in human are regarded as training dataset, which consisted of 307, 126, 128, and 129 carbonylation sites for K, R, T and P residues, respectively. To identify the useful features in predicting carbonylation sites, the linear amino acid sequence was adopted not only to build up the predictive model from training dataset, but also to compare the effectiveness of prediction with other types of features including amino acid composition (AAC), amino acid pair composition (AAPC), position-specific scoring matrix (PSSM), positional weighted matrix (PWM), solvent-accessible surface area (ASA), and physicochemical properties. The investigation of position-specific amino acid composition revealed that the positively charged amino acids (K and R) are remarkably enriched surrounding the carbonylated sites, which may play a functional role in discriminating between carbonylation and non-carbonylation sites. A variety of predictive models were built using various features and three different machine learning methods. Based on the evaluation by five-fold cross-validation, the models trained with PWM feature could provide better sensitivity in the positive training dataset, while the models trained with AAindex feature achieved higher specificity in the negative training dataset. Additionally, the model trained using hybrid features, including PWM, AAC and AAindex, obtained best MCC values of 0.432, 0.472, 0.443 and 0.467 on K, R, T and P residues, respectively. CONCLUSION: When comparing to an existing prediction tool, the selected models trained with hybrid features provided a promising accuracy on an independent testing dataset. In short, this work not only characterized the carbonylated substrate preference, but also demonstrated that the proposed method could provide a feasible means for accelerating preliminary discovery of protein carbonylation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1472-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5374553
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53745532017-03-31 Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features Weng, Shun-Long Huang, Kai-Yao Kaunang, Fergie Joanda Huang, Chien-Hsun Kao, Hui-Ju Chang, Tzu-Hao Wang, Hsin-Yao Lu, Jang-Jih Lee, Tzong-Yi BMC Bioinformatics Research BACKGROUND: Protein carbonylation, an irreversible and non-enzymatic post-translational modification (PTM), is often used as a marker of oxidative stress. When reactive oxygen species (ROS) oxidized the amino acid side chains, carbonyl (CO) groups are produced especially on Lysine (K), Arginine (R), Threonine (T), and Proline (P). Nevertheless, due to the lack of information about the carbonylated substrate specificity, we were encouraged to develop a systematic method for a comprehensive investigation of protein carbonylation sites. RESULTS: After the removal of redundant data from multipe carbonylation-related articles, totally 226 carbonylated proteins in human are regarded as training dataset, which consisted of 307, 126, 128, and 129 carbonylation sites for K, R, T and P residues, respectively. To identify the useful features in predicting carbonylation sites, the linear amino acid sequence was adopted not only to build up the predictive model from training dataset, but also to compare the effectiveness of prediction with other types of features including amino acid composition (AAC), amino acid pair composition (AAPC), position-specific scoring matrix (PSSM), positional weighted matrix (PWM), solvent-accessible surface area (ASA), and physicochemical properties. The investigation of position-specific amino acid composition revealed that the positively charged amino acids (K and R) are remarkably enriched surrounding the carbonylated sites, which may play a functional role in discriminating between carbonylation and non-carbonylation sites. A variety of predictive models were built using various features and three different machine learning methods. Based on the evaluation by five-fold cross-validation, the models trained with PWM feature could provide better sensitivity in the positive training dataset, while the models trained with AAindex feature achieved higher specificity in the negative training dataset. Additionally, the model trained using hybrid features, including PWM, AAC and AAindex, obtained best MCC values of 0.432, 0.472, 0.443 and 0.467 on K, R, T and P residues, respectively. CONCLUSION: When comparing to an existing prediction tool, the selected models trained with hybrid features provided a promising accuracy on an independent testing dataset. In short, this work not only characterized the carbonylated substrate preference, but also demonstrated that the proposed method could provide a feasible means for accelerating preliminary discovery of protein carbonylation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1472-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-03-14 /pmc/articles/PMC5374553/ /pubmed/28361707 http://dx.doi.org/10.1186/s12859-017-1472-8 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Weng, Shun-Long
Huang, Kai-Yao
Kaunang, Fergie Joanda
Huang, Chien-Hsun
Kao, Hui-Ju
Chang, Tzu-Hao
Wang, Hsin-Yao
Lu, Jang-Jih
Lee, Tzong-Yi
Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features
title Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features
title_full Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features
title_fullStr Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features
title_full_unstemmed Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features
title_short Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features
title_sort investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374553/
https://www.ncbi.nlm.nih.gov/pubmed/28361707
http://dx.doi.org/10.1186/s12859-017-1472-8
work_keys_str_mv AT wengshunlong investigationandidentificationofproteincarbonylationsitesbasedonpositionspecificaminoacidcompositionandphysicochemicalfeatures
AT huangkaiyao investigationandidentificationofproteincarbonylationsitesbasedonpositionspecificaminoacidcompositionandphysicochemicalfeatures
AT kaunangfergiejoanda investigationandidentificationofproteincarbonylationsitesbasedonpositionspecificaminoacidcompositionandphysicochemicalfeatures
AT huangchienhsun investigationandidentificationofproteincarbonylationsitesbasedonpositionspecificaminoacidcompositionandphysicochemicalfeatures
AT kaohuiju investigationandidentificationofproteincarbonylationsitesbasedonpositionspecificaminoacidcompositionandphysicochemicalfeatures
AT changtzuhao investigationandidentificationofproteincarbonylationsitesbasedonpositionspecificaminoacidcompositionandphysicochemicalfeatures
AT wanghsinyao investigationandidentificationofproteincarbonylationsitesbasedonpositionspecificaminoacidcompositionandphysicochemicalfeatures
AT lujangjih investigationandidentificationofproteincarbonylationsitesbasedonpositionspecificaminoacidcompositionandphysicochemicalfeatures
AT leetzongyi investigationandidentificationofproteincarbonylationsitesbasedonpositionspecificaminoacidcompositionandphysicochemicalfeatures