Cargando…
CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques
BACKGROUND: Carbonylation is a non-enzymatic irreversible protein post-translational modification, and refers to the side chain of amino acid residues being attacked by reactive oxygen species and finally converted into carbonyl products. Studies have shown that protein carbonylation caused by react...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8077735/ https://www.ncbi.nlm.nih.gov/pubmed/33902446 http://dx.doi.org/10.1186/s12859-021-04134-3 |
_version_ | 1783684934391562240 |
---|---|
author | Zuo, Yun Lin, Jianyuan Zeng, Xiangxiang Zou, Quan Liu, Xiangrong |
author_facet | Zuo, Yun Lin, Jianyuan Zeng, Xiangxiang Zou, Quan Liu, Xiangrong |
author_sort | Zuo, Yun |
collection | PubMed |
description | BACKGROUND: Carbonylation is a non-enzymatic irreversible protein post-translational modification, and refers to the side chain of amino acid residues being attacked by reactive oxygen species and finally converted into carbonyl products. Studies have shown that protein carbonylation caused by reactive oxygen species is involved in the etiology and pathophysiological processes of aging, neurodegenerative diseases, inflammation, diabetes, amyotrophic lateral sclerosis, Huntington’s disease, and tumor. Current experimental approaches used to predict carbonylation sites are expensive, time-consuming, and limited in protein processing abilities. Computational prediction of the carbonylation residue location in protein post-translational modifications enhances the functional characterization of proteins. RESULTS: In this study, an integrated classifier algorithm, CarSite-II, was developed to identify K, P, R, and T carbonylated sites. The resampling method K-means similarity-based undersampling and the synthetic minority oversampling technique (SMOTE-KSU) were incorporated to balance the proportions of K, P, R, and T carbonylated training samples. Next, the integrated classifier system Rotation Forest uses “support vector machine” subclassifications to divide three types of feature spaces into several subsets. CarSite-II gained Matthew’s correlation coefficient (MCC) values of 0.2287/0.3125/0.2787/0.2814, False Positive rate values of 0.2628/0.1084/0.1383/0.1313, False Negative rate values of 0.2252/0.0205/0.0976/0.0608 for K/P/R/T carbonylation sites by tenfold cross-validation, respectively. On our independent test dataset, CarSite-II yield MCC values of 0.6358/0.2910/0.4629/0.3685, False Positive rate values of 0.0165/0.0203/0.0188/0.0094, False Negative rate values of 0.1026/0.1875/0.2037/0.3333 for K/P/R/T carbonylation sites. The results show that CarSite-II achieves remarkably better performance than all currently available prediction tools. CONCLUSION: The related results revealed that CarSite-II achieved better performance than the currently available five programs, and revealed the usefulness of the SMOTE-KSU resampling approach and integration algorithm. For the convenience of experimental scientists, the web tool of CarSite-II is available in http://47.100.136.41:8081/ SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04134-3. |
format | Online Article Text |
id | pubmed-8077735 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-80777352021-04-29 CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques Zuo, Yun Lin, Jianyuan Zeng, Xiangxiang Zou, Quan Liu, Xiangrong BMC Bioinformatics Methodology Article BACKGROUND: Carbonylation is a non-enzymatic irreversible protein post-translational modification, and refers to the side chain of amino acid residues being attacked by reactive oxygen species and finally converted into carbonyl products. Studies have shown that protein carbonylation caused by reactive oxygen species is involved in the etiology and pathophysiological processes of aging, neurodegenerative diseases, inflammation, diabetes, amyotrophic lateral sclerosis, Huntington’s disease, and tumor. Current experimental approaches used to predict carbonylation sites are expensive, time-consuming, and limited in protein processing abilities. Computational prediction of the carbonylation residue location in protein post-translational modifications enhances the functional characterization of proteins. RESULTS: In this study, an integrated classifier algorithm, CarSite-II, was developed to identify K, P, R, and T carbonylated sites. The resampling method K-means similarity-based undersampling and the synthetic minority oversampling technique (SMOTE-KSU) were incorporated to balance the proportions of K, P, R, and T carbonylated training samples. Next, the integrated classifier system Rotation Forest uses “support vector machine” subclassifications to divide three types of feature spaces into several subsets. CarSite-II gained Matthew’s correlation coefficient (MCC) values of 0.2287/0.3125/0.2787/0.2814, False Positive rate values of 0.2628/0.1084/0.1383/0.1313, False Negative rate values of 0.2252/0.0205/0.0976/0.0608 for K/P/R/T carbonylation sites by tenfold cross-validation, respectively. On our independent test dataset, CarSite-II yield MCC values of 0.6358/0.2910/0.4629/0.3685, False Positive rate values of 0.0165/0.0203/0.0188/0.0094, False Negative rate values of 0.1026/0.1875/0.2037/0.3333 for K/P/R/T carbonylation sites. The results show that CarSite-II achieves remarkably better performance than all currently available prediction tools. CONCLUSION: The related results revealed that CarSite-II achieved better performance than the currently available five programs, and revealed the usefulness of the SMOTE-KSU resampling approach and integration algorithm. For the convenience of experimental scientists, the web tool of CarSite-II is available in http://47.100.136.41:8081/ SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04134-3. BioMed Central 2021-04-26 /pmc/articles/PMC8077735/ /pubmed/33902446 http://dx.doi.org/10.1186/s12859-021-04134-3 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Article Zuo, Yun Lin, Jianyuan Zeng, Xiangxiang Zou, Quan Liu, Xiangrong CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques |
title | CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques |
title_full | CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques |
title_fullStr | CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques |
title_full_unstemmed | CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques |
title_short | CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques |
title_sort | carsite-ii: an integrated classification algorithm for identifying carbonylated sites based on k-means similarity-based undersampling and synthetic minority oversampling techniques |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8077735/ https://www.ncbi.nlm.nih.gov/pubmed/33902446 http://dx.doi.org/10.1186/s12859-021-04134-3 |
work_keys_str_mv | AT zuoyun carsiteiianintegratedclassificationalgorithmforidentifyingcarbonylatedsitesbasedonkmeanssimilaritybasedundersamplingandsyntheticminorityoversamplingtechniques AT linjianyuan carsiteiianintegratedclassificationalgorithmforidentifyingcarbonylatedsitesbasedonkmeanssimilaritybasedundersamplingandsyntheticminorityoversamplingtechniques AT zengxiangxiang carsiteiianintegratedclassificationalgorithmforidentifyingcarbonylatedsitesbasedonkmeanssimilaritybasedundersamplingandsyntheticminorityoversamplingtechniques AT zouquan carsiteiianintegratedclassificationalgorithmforidentifyingcarbonylatedsitesbasedonkmeanssimilaritybasedundersamplingandsyntheticminorityoversamplingtechniques AT liuxiangrong carsiteiianintegratedclassificationalgorithmforidentifyingcarbonylatedsitesbasedonkmeanssimilaritybasedundersamplingandsyntheticminorityoversamplingtechniques |