Cargando…

A novel two-way rebalancing strategy for identifying carbonylation sites

BACKGROUND: As an irreversible post-translational modification, protein carbonylation is closely related to many diseases and aging. Protein carbonylation prediction for related patients is significant, which can help clinicians make appropriate therapeutic schemes. Because carbonylation sites can b...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Linjun, Jing, Xiao-Yuan, Hao, Yaru, Liu, Wei, Zhu, Xiaoke, Han, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10644465/
https://www.ncbi.nlm.nih.gov/pubmed/37957582
http://dx.doi.org/10.1186/s12859-023-05551-2
_version_ 1785147234554216448
author Chen, Linjun
Jing, Xiao-Yuan
Hao, Yaru
Liu, Wei
Zhu, Xiaoke
Han, Wei
author_facet Chen, Linjun
Jing, Xiao-Yuan
Hao, Yaru
Liu, Wei
Zhu, Xiaoke
Han, Wei
author_sort Chen, Linjun
collection PubMed
description BACKGROUND: As an irreversible post-translational modification, protein carbonylation is closely related to many diseases and aging. Protein carbonylation prediction for related patients is significant, which can help clinicians make appropriate therapeutic schemes. Because carbonylation sites can be used to indicate change or loss of protein function, integrating these protein carbonylation site data has been a promising method in prediction. Based on these protein carbonylation site data, some protein carbonylation prediction methods have been proposed. However, most data is highly class imbalanced, and the number of un-carbonylation sites greatly exceeds that of carbonylation sites. Unfortunately, existing methods have not addressed this issue adequately. RESULTS: In this work, we propose a novel two-way rebalancing strategy based on the attention technique and generative adversarial network (Carsite_AGan) for identifying protein carbonylation sites. Specifically, Carsite_AGan proposes a novel undersampling method based on attention technology that allows sites with high importance value to be selected from un-carbonylation sites. The attention technique can obtain the value of each sample’s importance. In the meanwhile, Carsite_AGan designs a generative adversarial network-based oversampling method to generate high-feasibility carbonylation sites. The generative adversarial network can generate high-feasibility samples through its generator and discriminator. Finally, we use a classifier like a nonlinear support vector machine to identify protein carbonylation sites. CONCLUSIONS: Experimental results demonstrate that our approach significantly outperforms other resampling methods. Using our approach to resampling carbonylation data can significantly improve the effect of identifying protein carbonylation sites.
format Online
Article
Text
id pubmed-10644465
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106444652023-11-13 A novel two-way rebalancing strategy for identifying carbonylation sites Chen, Linjun Jing, Xiao-Yuan Hao, Yaru Liu, Wei Zhu, Xiaoke Han, Wei BMC Bioinformatics Research BACKGROUND: As an irreversible post-translational modification, protein carbonylation is closely related to many diseases and aging. Protein carbonylation prediction for related patients is significant, which can help clinicians make appropriate therapeutic schemes. Because carbonylation sites can be used to indicate change or loss of protein function, integrating these protein carbonylation site data has been a promising method in prediction. Based on these protein carbonylation site data, some protein carbonylation prediction methods have been proposed. However, most data is highly class imbalanced, and the number of un-carbonylation sites greatly exceeds that of carbonylation sites. Unfortunately, existing methods have not addressed this issue adequately. RESULTS: In this work, we propose a novel two-way rebalancing strategy based on the attention technique and generative adversarial network (Carsite_AGan) for identifying protein carbonylation sites. Specifically, Carsite_AGan proposes a novel undersampling method based on attention technology that allows sites with high importance value to be selected from un-carbonylation sites. The attention technique can obtain the value of each sample’s importance. In the meanwhile, Carsite_AGan designs a generative adversarial network-based oversampling method to generate high-feasibility carbonylation sites. The generative adversarial network can generate high-feasibility samples through its generator and discriminator. Finally, we use a classifier like a nonlinear support vector machine to identify protein carbonylation sites. CONCLUSIONS: Experimental results demonstrate that our approach significantly outperforms other resampling methods. Using our approach to resampling carbonylation data can significantly improve the effect of identifying protein carbonylation sites. BioMed Central 2023-11-13 /pmc/articles/PMC10644465/ /pubmed/37957582 http://dx.doi.org/10.1186/s12859-023-05551-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Chen, Linjun
Jing, Xiao-Yuan
Hao, Yaru
Liu, Wei
Zhu, Xiaoke
Han, Wei
A novel two-way rebalancing strategy for identifying carbonylation sites
title A novel two-way rebalancing strategy for identifying carbonylation sites
title_full A novel two-way rebalancing strategy for identifying carbonylation sites
title_fullStr A novel two-way rebalancing strategy for identifying carbonylation sites
title_full_unstemmed A novel two-way rebalancing strategy for identifying carbonylation sites
title_short A novel two-way rebalancing strategy for identifying carbonylation sites
title_sort novel two-way rebalancing strategy for identifying carbonylation sites
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10644465/
https://www.ncbi.nlm.nih.gov/pubmed/37957582
http://dx.doi.org/10.1186/s12859-023-05551-2
work_keys_str_mv AT chenlinjun anoveltwowayrebalancingstrategyforidentifyingcarbonylationsites
AT jingxiaoyuan anoveltwowayrebalancingstrategyforidentifyingcarbonylationsites
AT haoyaru anoveltwowayrebalancingstrategyforidentifyingcarbonylationsites
AT liuwei anoveltwowayrebalancingstrategyforidentifyingcarbonylationsites
AT zhuxiaoke anoveltwowayrebalancingstrategyforidentifyingcarbonylationsites
AT hanwei anoveltwowayrebalancingstrategyforidentifyingcarbonylationsites
AT chenlinjun noveltwowayrebalancingstrategyforidentifyingcarbonylationsites
AT jingxiaoyuan noveltwowayrebalancingstrategyforidentifyingcarbonylationsites
AT haoyaru noveltwowayrebalancingstrategyforidentifyingcarbonylationsites
AT liuwei noveltwowayrebalancingstrategyforidentifyingcarbonylationsites
AT zhuxiaoke noveltwowayrebalancingstrategyforidentifyingcarbonylationsites
AT hanwei noveltwowayrebalancingstrategyforidentifyingcarbonylationsites