Cargando…

CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques

BACKGROUND: Carbonylation is a non-enzymatic irreversible protein post-translational modification, and refers to the side chain of amino acid residues being attacked by reactive oxygen species and finally converted into carbonyl products. Studies have shown that protein carbonylation caused by react...

Descripción completa

Detalles Bibliográficos
Autores principales: Zuo, Yun, Lin, Jianyuan, Zeng, Xiangxiang, Zou, Quan, Liu, Xiangrong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8077735/
https://www.ncbi.nlm.nih.gov/pubmed/33902446
http://dx.doi.org/10.1186/s12859-021-04134-3
_version_ 1783684934391562240
author Zuo, Yun
Lin, Jianyuan
Zeng, Xiangxiang
Zou, Quan
Liu, Xiangrong
author_facet Zuo, Yun
Lin, Jianyuan
Zeng, Xiangxiang
Zou, Quan
Liu, Xiangrong
author_sort Zuo, Yun
collection PubMed
description BACKGROUND: Carbonylation is a non-enzymatic irreversible protein post-translational modification, and refers to the side chain of amino acid residues being attacked by reactive oxygen species and finally converted into carbonyl products. Studies have shown that protein carbonylation caused by reactive oxygen species is involved in the etiology and pathophysiological processes of aging, neurodegenerative diseases, inflammation, diabetes, amyotrophic lateral sclerosis, Huntington’s disease, and tumor. Current experimental approaches used to predict carbonylation sites are expensive, time-consuming, and limited in protein processing abilities. Computational prediction of the carbonylation residue location in protein post-translational modifications enhances the functional characterization of proteins. RESULTS: In this study, an integrated classifier algorithm, CarSite-II, was developed to identify K, P, R, and T carbonylated sites. The resampling method K-means similarity-based undersampling and the synthetic minority oversampling technique (SMOTE-KSU) were incorporated to balance the proportions of K, P, R, and T carbonylated training samples. Next, the integrated classifier system Rotation Forest uses “support vector machine” subclassifications to divide three types of feature spaces into several subsets. CarSite-II gained Matthew’s correlation coefficient (MCC) values of 0.2287/0.3125/0.2787/0.2814, False Positive rate values of 0.2628/0.1084/0.1383/0.1313, False Negative rate values of 0.2252/0.0205/0.0976/0.0608 for K/P/R/T carbonylation sites by tenfold cross-validation, respectively. On our independent test dataset, CarSite-II yield MCC values of 0.6358/0.2910/0.4629/0.3685, False Positive rate values of 0.0165/0.0203/0.0188/0.0094, False Negative rate values of 0.1026/0.1875/0.2037/0.3333 for K/P/R/T carbonylation sites. The results show that CarSite-II achieves remarkably better performance than all currently available prediction tools. CONCLUSION: The related results revealed that CarSite-II achieved better performance than the currently available five programs, and revealed the usefulness of the SMOTE-KSU resampling approach and integration algorithm. For the convenience of experimental scientists, the web tool of CarSite-II is available in http://47.100.136.41:8081/ SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04134-3.
format Online
Article
Text
id pubmed-8077735
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-80777352021-04-29 CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques Zuo, Yun Lin, Jianyuan Zeng, Xiangxiang Zou, Quan Liu, Xiangrong BMC Bioinformatics Methodology Article BACKGROUND: Carbonylation is a non-enzymatic irreversible protein post-translational modification, and refers to the side chain of amino acid residues being attacked by reactive oxygen species and finally converted into carbonyl products. Studies have shown that protein carbonylation caused by reactive oxygen species is involved in the etiology and pathophysiological processes of aging, neurodegenerative diseases, inflammation, diabetes, amyotrophic lateral sclerosis, Huntington’s disease, and tumor. Current experimental approaches used to predict carbonylation sites are expensive, time-consuming, and limited in protein processing abilities. Computational prediction of the carbonylation residue location in protein post-translational modifications enhances the functional characterization of proteins. RESULTS: In this study, an integrated classifier algorithm, CarSite-II, was developed to identify K, P, R, and T carbonylated sites. The resampling method K-means similarity-based undersampling and the synthetic minority oversampling technique (SMOTE-KSU) were incorporated to balance the proportions of K, P, R, and T carbonylated training samples. Next, the integrated classifier system Rotation Forest uses “support vector machine” subclassifications to divide three types of feature spaces into several subsets. CarSite-II gained Matthew’s correlation coefficient (MCC) values of 0.2287/0.3125/0.2787/0.2814, False Positive rate values of 0.2628/0.1084/0.1383/0.1313, False Negative rate values of 0.2252/0.0205/0.0976/0.0608 for K/P/R/T carbonylation sites by tenfold cross-validation, respectively. On our independent test dataset, CarSite-II yield MCC values of 0.6358/0.2910/0.4629/0.3685, False Positive rate values of 0.0165/0.0203/0.0188/0.0094, False Negative rate values of 0.1026/0.1875/0.2037/0.3333 for K/P/R/T carbonylation sites. The results show that CarSite-II achieves remarkably better performance than all currently available prediction tools. CONCLUSION: The related results revealed that CarSite-II achieved better performance than the currently available five programs, and revealed the usefulness of the SMOTE-KSU resampling approach and integration algorithm. For the convenience of experimental scientists, the web tool of CarSite-II is available in http://47.100.136.41:8081/ SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04134-3. BioMed Central 2021-04-26 /pmc/articles/PMC8077735/ /pubmed/33902446 http://dx.doi.org/10.1186/s12859-021-04134-3 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Zuo, Yun
Lin, Jianyuan
Zeng, Xiangxiang
Zou, Quan
Liu, Xiangrong
CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques
title CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques
title_full CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques
title_fullStr CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques
title_full_unstemmed CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques
title_short CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques
title_sort carsite-ii: an integrated classification algorithm for identifying carbonylated sites based on k-means similarity-based undersampling and synthetic minority oversampling techniques
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8077735/
https://www.ncbi.nlm.nih.gov/pubmed/33902446
http://dx.doi.org/10.1186/s12859-021-04134-3
work_keys_str_mv AT zuoyun carsiteiianintegratedclassificationalgorithmforidentifyingcarbonylatedsitesbasedonkmeanssimilaritybasedundersamplingandsyntheticminorityoversamplingtechniques
AT linjianyuan carsiteiianintegratedclassificationalgorithmforidentifyingcarbonylatedsitesbasedonkmeanssimilaritybasedundersamplingandsyntheticminorityoversamplingtechniques
AT zengxiangxiang carsiteiianintegratedclassificationalgorithmforidentifyingcarbonylatedsitesbasedonkmeanssimilaritybasedundersamplingandsyntheticminorityoversamplingtechniques
AT zouquan carsiteiianintegratedclassificationalgorithmforidentifyingcarbonylatedsitesbasedonkmeanssimilaritybasedundersamplingandsyntheticminorityoversamplingtechniques
AT liuxiangrong carsiteiianintegratedclassificationalgorithmforidentifyingcarbonylatedsitesbasedonkmeanssimilaritybasedundersamplingandsyntheticminorityoversamplingtechniques