Cargando…

Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network

Predicting the protein sequence information of enzymes and non-enzymes is an important but a very challenging task. Existing methods use protein geometric structures only or protein sequences alone to predict enzymatic functions. Thus, their prediction results are unsatisfactory. In this paper, we p...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sikander, Rahu, Wang, Yuping, Ghulam, Ali, Wu, Xianjuan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8670239/ https://www.ncbi.nlm.nih.gov/pubmed/34917128 http://dx.doi.org/10.3389/fgene.2021.759384

_version_	1784614939567063040
author	Sikander, Rahu Wang, Yuping Ghulam, Ali Wu, Xianjuan
author_facet	Sikander, Rahu Wang, Yuping Ghulam, Ali Wu, Xianjuan
author_sort	Sikander, Rahu
collection	PubMed
description	Predicting the protein sequence information of enzymes and non-enzymes is an important but a very challenging task. Existing methods use protein geometric structures only or protein sequences alone to predict enzymatic functions. Thus, their prediction results are unsatisfactory. In this paper, we propose a novel approach for predicting the amino acid sequences of enzymes and non-enzymes via Convolutional Neural Network (CNN). In CNN, the roles of enzymes are predicted from multiple sides of biological information, including information on sequences and structures. We propose the use of two-dimensional data via 2DCNN to predict the proteins of enzymes and non-enzymes by using the same fivefold cross-validation function. We also use an independent dataset to test the performance of our model, and the results demonstrate that we are able to solve the overfitting problem. We used the CNN model proposed herein to demonstrate the superiority of our model for classifying an entire set of filters, such as 32, 64, and 128 parameters, with the fivefold validation test set as the independent classification. Via the Dipeptide Deviation from Expected Mean (DDE) matrix, mutation information is extracted from amino acid sequences and structural information with the distance and angle of amino acids is conveyed. The derived feature maps are then encoded in DDE exploitation. The independent datasets are then compared with other two methods, namely, GRU and XGBOOST. All analyses were conducted using 32, 64 and 128 filters on our proposed CNN method. The cross-validation datasets achieved an accuracy score of 0.8762%, whereas the accuracy of independent datasets was 0.7621%. Additional variables were derived on the basis of ROC AUC with fivefold cross-validation was achieved score is 0.95%. The performance of our model and that of other models in terms of sensitivity (0.9028%) and specificity (0.8497%) was compared. The overall accuracy of our model was 0.9133% compared with 0.8310% for the other model.
format	Online Article Text
id	pubmed-8670239
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-86702392021-12-15 Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network Sikander, Rahu Wang, Yuping Ghulam, Ali Wu, Xianjuan Front Genet Genetics Predicting the protein sequence information of enzymes and non-enzymes is an important but a very challenging task. Existing methods use protein geometric structures only or protein sequences alone to predict enzymatic functions. Thus, their prediction results are unsatisfactory. In this paper, we propose a novel approach for predicting the amino acid sequences of enzymes and non-enzymes via Convolutional Neural Network (CNN). In CNN, the roles of enzymes are predicted from multiple sides of biological information, including information on sequences and structures. We propose the use of two-dimensional data via 2DCNN to predict the proteins of enzymes and non-enzymes by using the same fivefold cross-validation function. We also use an independent dataset to test the performance of our model, and the results demonstrate that we are able to solve the overfitting problem. We used the CNN model proposed herein to demonstrate the superiority of our model for classifying an entire set of filters, such as 32, 64, and 128 parameters, with the fivefold validation test set as the independent classification. Via the Dipeptide Deviation from Expected Mean (DDE) matrix, mutation information is extracted from amino acid sequences and structural information with the distance and angle of amino acids is conveyed. The derived feature maps are then encoded in DDE exploitation. The independent datasets are then compared with other two methods, namely, GRU and XGBOOST. All analyses were conducted using 32, 64 and 128 filters on our proposed CNN method. The cross-validation datasets achieved an accuracy score of 0.8762%, whereas the accuracy of independent datasets was 0.7621%. Additional variables were derived on the basis of ROC AUC with fivefold cross-validation was achieved score is 0.95%. The performance of our model and that of other models in terms of sensitivity (0.9028%) and specificity (0.8497%) was compared. The overall accuracy of our model was 0.9133% compared with 0.8310% for the other model. Frontiers Media S.A. 2021-11-30 /pmc/articles/PMC8670239/ /pubmed/34917128 http://dx.doi.org/10.3389/fgene.2021.759384 Text en Copyright © 2021 Sikander, Wang, Ghulam and Wu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Sikander, Rahu Wang, Yuping Ghulam, Ali Wu, Xianjuan Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network
title	Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network
title_full	Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network
title_fullStr	Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network
title_full_unstemmed	Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network
title_short	Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network
title_sort	identification of enzymes-specific protein domain based on dde, and convolutional neural network
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8670239/ https://www.ncbi.nlm.nih.gov/pubmed/34917128 http://dx.doi.org/10.3389/fgene.2021.759384
work_keys_str_mv	AT sikanderrahu identificationofenzymesspecificproteindomainbasedonddeandconvolutionalneuralnetwork AT wangyuping identificationofenzymesspecificproteindomainbasedonddeandconvolutionalneuralnetwork AT ghulamali identificationofenzymesspecificproteindomainbasedonddeandconvolutionalneuralnetwork AT wuxianjuan identificationofenzymesspecificproteindomainbasedonddeandconvolutionalneuralnetwork

Identification of Enzymes-specific Protein Domain Based on DDE, and Convolutional Neural Network

Ejemplares similares