Cargando…

Identification of the ubiquitin–proteasome pathway domain by hyperparameter optimization based on a 2D convolutional neural network

The major mechanism of proteolysis in the cytosol and nucleus is the ubiquitin–proteasome pathway (UPP). The highly controlled UPP has an effect on a wide range of cellular processes and substrates, and flaws in the system can lead to the pathogenesis of a number of serious human diseases. Knowledge...

Descripción completa

Detalles Bibliográficos
Autores principales: Sikander, Rahu, Arif, Muhammad, Ghulam, Ali, Worachartcheewan, Apilak, Thafar, Maha A., Habib, Shabana
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9355632/
https://www.ncbi.nlm.nih.gov/pubmed/35937990
http://dx.doi.org/10.3389/fgene.2022.851688
_version_ 1784763338247372800
author Sikander, Rahu
Arif, Muhammad
Ghulam, Ali
Worachartcheewan, Apilak
Thafar, Maha A.
Habib, Shabana
author_facet Sikander, Rahu
Arif, Muhammad
Ghulam, Ali
Worachartcheewan, Apilak
Thafar, Maha A.
Habib, Shabana
author_sort Sikander, Rahu
collection PubMed
description The major mechanism of proteolysis in the cytosol and nucleus is the ubiquitin–proteasome pathway (UPP). The highly controlled UPP has an effect on a wide range of cellular processes and substrates, and flaws in the system can lead to the pathogenesis of a number of serious human diseases. Knowledge about UPPs provide useful hints to understand the cellular process and drug discovery. The exponential growth in next-generation sequencing wet lab approaches have accelerated the accumulation of unannotated data in online databases, making the UPP characterization/analysis task more challenging. Thus, computational methods are used as an alternative for fast and accurate identification of UPPs. Aiming this, we develop a novel deep learning-based predictor named “2DCNN-UPP” for identifying UPPs with low error rate. In the proposed method, we used proposed algorithm with a two-dimensional convolutional neural network with dipeptide deviation features. To avoid the over fitting problem, genetic algorithm is employed to select the optimal features. Finally, the optimized attribute set are fed as input to the 2D-CNN learning engine for building the model. Empirical evidence or outcomes demonstrates that the proposed predictor achieved an overall accuracy and AUC (ROC) value using 10-fold cross validation test. Superior performance compared to other state-of-the art methods for discrimination the relations UPPs classification. Both on and independent test respectively was trained on 10-fold cross validation method and then evaluated through independent test. In the case where experimentally validated ubiquitination sites emerged, we must devise a proteomics-based predictor of ubiquitination. Meanwhile, we also evaluated the generalization power of our trained modal via independent test, and obtained remarkable performance in term of 0.862 accuracy, 0.921 sensitivity, 0.803 specificity 0.803, and 0.730 Matthews correlation coefficient (MCC) respectively. Four approaches were used in the sequences, and the physical properties were calculated combined. When used a 10-fold cross-validation, 2D-CNN-UPP obtained an AUC (ROC) value of 0.862 predicted score. We analyzed the relationship between UPP protein and non-UPP protein predicted score. Last but not least, this research could effectively analyze the large scale relationship between UPP proteins and non-UPP proteins in particular and other protein problems in general and our research work might improve computational biological research. Therefore, we could utilize the latest features in our model framework and Dipeptide Deviation from Expected Mean (DDE) -based protein structure features for the prediction of protein structure, functions, and different molecules, such as DNA and RNA.
format Online
Article
Text
id pubmed-9355632
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-93556322022-08-06 Identification of the ubiquitin–proteasome pathway domain by hyperparameter optimization based on a 2D convolutional neural network Sikander, Rahu Arif, Muhammad Ghulam, Ali Worachartcheewan, Apilak Thafar, Maha A. Habib, Shabana Front Genet Genetics The major mechanism of proteolysis in the cytosol and nucleus is the ubiquitin–proteasome pathway (UPP). The highly controlled UPP has an effect on a wide range of cellular processes and substrates, and flaws in the system can lead to the pathogenesis of a number of serious human diseases. Knowledge about UPPs provide useful hints to understand the cellular process and drug discovery. The exponential growth in next-generation sequencing wet lab approaches have accelerated the accumulation of unannotated data in online databases, making the UPP characterization/analysis task more challenging. Thus, computational methods are used as an alternative for fast and accurate identification of UPPs. Aiming this, we develop a novel deep learning-based predictor named “2DCNN-UPP” for identifying UPPs with low error rate. In the proposed method, we used proposed algorithm with a two-dimensional convolutional neural network with dipeptide deviation features. To avoid the over fitting problem, genetic algorithm is employed to select the optimal features. Finally, the optimized attribute set are fed as input to the 2D-CNN learning engine for building the model. Empirical evidence or outcomes demonstrates that the proposed predictor achieved an overall accuracy and AUC (ROC) value using 10-fold cross validation test. Superior performance compared to other state-of-the art methods for discrimination the relations UPPs classification. Both on and independent test respectively was trained on 10-fold cross validation method and then evaluated through independent test. In the case where experimentally validated ubiquitination sites emerged, we must devise a proteomics-based predictor of ubiquitination. Meanwhile, we also evaluated the generalization power of our trained modal via independent test, and obtained remarkable performance in term of 0.862 accuracy, 0.921 sensitivity, 0.803 specificity 0.803, and 0.730 Matthews correlation coefficient (MCC) respectively. Four approaches were used in the sequences, and the physical properties were calculated combined. When used a 10-fold cross-validation, 2D-CNN-UPP obtained an AUC (ROC) value of 0.862 predicted score. We analyzed the relationship between UPP protein and non-UPP protein predicted score. Last but not least, this research could effectively analyze the large scale relationship between UPP proteins and non-UPP proteins in particular and other protein problems in general and our research work might improve computational biological research. Therefore, we could utilize the latest features in our model framework and Dipeptide Deviation from Expected Mean (DDE) -based protein structure features for the prediction of protein structure, functions, and different molecules, such as DNA and RNA. Frontiers Media S.A. 2022-07-22 /pmc/articles/PMC9355632/ /pubmed/35937990 http://dx.doi.org/10.3389/fgene.2022.851688 Text en Copyright © 2022 Sikander, Arif, Ghulam, Worachartcheewan, Thafar and Habib. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Sikander, Rahu
Arif, Muhammad
Ghulam, Ali
Worachartcheewan, Apilak
Thafar, Maha A.
Habib, Shabana
Identification of the ubiquitin–proteasome pathway domain by hyperparameter optimization based on a 2D convolutional neural network
title Identification of the ubiquitin–proteasome pathway domain by hyperparameter optimization based on a 2D convolutional neural network
title_full Identification of the ubiquitin–proteasome pathway domain by hyperparameter optimization based on a 2D convolutional neural network
title_fullStr Identification of the ubiquitin–proteasome pathway domain by hyperparameter optimization based on a 2D convolutional neural network
title_full_unstemmed Identification of the ubiquitin–proteasome pathway domain by hyperparameter optimization based on a 2D convolutional neural network
title_short Identification of the ubiquitin–proteasome pathway domain by hyperparameter optimization based on a 2D convolutional neural network
title_sort identification of the ubiquitin–proteasome pathway domain by hyperparameter optimization based on a 2d convolutional neural network
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9355632/
https://www.ncbi.nlm.nih.gov/pubmed/35937990
http://dx.doi.org/10.3389/fgene.2022.851688
work_keys_str_mv AT sikanderrahu identificationoftheubiquitinproteasomepathwaydomainbyhyperparameteroptimizationbasedona2dconvolutionalneuralnetwork
AT arifmuhammad identificationoftheubiquitinproteasomepathwaydomainbyhyperparameteroptimizationbasedona2dconvolutionalneuralnetwork
AT ghulamali identificationoftheubiquitinproteasomepathwaydomainbyhyperparameteroptimizationbasedona2dconvolutionalneuralnetwork
AT worachartcheewanapilak identificationoftheubiquitinproteasomepathwaydomainbyhyperparameteroptimizationbasedona2dconvolutionalneuralnetwork
AT thafarmahaa identificationoftheubiquitinproteasomepathwaydomainbyhyperparameteroptimizationbasedona2dconvolutionalneuralnetwork
AT habibshabana identificationoftheubiquitinproteasomepathwaydomainbyhyperparameteroptimizationbasedona2dconvolutionalneuralnetwork