Cargando…

Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning

DNA-binding proteins (DBPs) play pivotal roles in many biological functions such as alternative splicing, RNA editing, and methylation. Many traditional machine learning (ML) methods and deep learning (DL) methods have been proposed to predict DBPs. However, these methods either rely on manual featu...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Guobin, Du, Xiuquan, Li, Xinlu, Zou, Le, Zhang, Guanhong, Wu, Zhize
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8101451/
https://www.ncbi.nlm.nih.gov/pubmed/33986992
http://dx.doi.org/10.7717/peerj.11262
_version_ 1783688953858097152
author Li, Guobin
Du, Xiuquan
Li, Xinlu
Zou, Le
Zhang, Guanhong
Wu, Zhize
author_facet Li, Guobin
Du, Xiuquan
Li, Xinlu
Zou, Le
Zhang, Guanhong
Wu, Zhize
author_sort Li, Guobin
collection PubMed
description DNA-binding proteins (DBPs) play pivotal roles in many biological functions such as alternative splicing, RNA editing, and methylation. Many traditional machine learning (ML) methods and deep learning (DL) methods have been proposed to predict DBPs. However, these methods either rely on manual feature extraction or fail to capture long-term dependencies in the DNA sequence. In this paper, we propose a method, called PDBP-Fusion, to identify DBPs based on the fusion of local features and long-term dependencies only from primary sequences. We utilize convolutional neural network (CNN) to learn local features and use bi-directional long-short term memory network (Bi-LSTM) to capture critical long-term dependencies in context. Besides, we perform feature extraction, model training, and model prediction simultaneously. The PDBP-Fusion approach can predict DBPs with 86.45% sensitivity, 79.13% specificity, 82.81% accuracy, and 0.661 MCC on the PDB14189 benchmark dataset. The MCC of our proposed methods has been increased by at least 9.1% compared to other advanced prediction models. Moreover, the PDBP-Fusion also gets superior performance and model robustness on the PDB2272 independent dataset. It demonstrates that the PDBP-Fusion can be used to predict DBPs from sequences accurately and effectively; the online server is at http://119.45.144.26:8080/PDBP-Fusion/.
format Online
Article
Text
id pubmed-8101451
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-81014512021-05-12 Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning Li, Guobin Du, Xiuquan Li, Xinlu Zou, Le Zhang, Guanhong Wu, Zhize PeerJ Bioinformatics DNA-binding proteins (DBPs) play pivotal roles in many biological functions such as alternative splicing, RNA editing, and methylation. Many traditional machine learning (ML) methods and deep learning (DL) methods have been proposed to predict DBPs. However, these methods either rely on manual feature extraction or fail to capture long-term dependencies in the DNA sequence. In this paper, we propose a method, called PDBP-Fusion, to identify DBPs based on the fusion of local features and long-term dependencies only from primary sequences. We utilize convolutional neural network (CNN) to learn local features and use bi-directional long-short term memory network (Bi-LSTM) to capture critical long-term dependencies in context. Besides, we perform feature extraction, model training, and model prediction simultaneously. The PDBP-Fusion approach can predict DBPs with 86.45% sensitivity, 79.13% specificity, 82.81% accuracy, and 0.661 MCC on the PDB14189 benchmark dataset. The MCC of our proposed methods has been increased by at least 9.1% compared to other advanced prediction models. Moreover, the PDBP-Fusion also gets superior performance and model robustness on the PDB2272 independent dataset. It demonstrates that the PDBP-Fusion can be used to predict DBPs from sequences accurately and effectively; the online server is at http://119.45.144.26:8080/PDBP-Fusion/. PeerJ Inc. 2021-05-03 /pmc/articles/PMC8101451/ /pubmed/33986992 http://dx.doi.org/10.7717/peerj.11262 Text en ©2021 Li et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Li, Guobin
Du, Xiuquan
Li, Xinlu
Zou, Le
Zhang, Guanhong
Wu, Zhize
Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning
title Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning
title_full Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning
title_fullStr Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning
title_full_unstemmed Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning
title_short Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning
title_sort prediction of dna binding proteins using local features and long-term dependencies with primary sequences based on deep learning
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8101451/
https://www.ncbi.nlm.nih.gov/pubmed/33986992
http://dx.doi.org/10.7717/peerj.11262
work_keys_str_mv AT liguobin predictionofdnabindingproteinsusinglocalfeaturesandlongtermdependencieswithprimarysequencesbasedondeeplearning
AT duxiuquan predictionofdnabindingproteinsusinglocalfeaturesandlongtermdependencieswithprimarysequencesbasedondeeplearning
AT lixinlu predictionofdnabindingproteinsusinglocalfeaturesandlongtermdependencieswithprimarysequencesbasedondeeplearning
AT zoule predictionofdnabindingproteinsusinglocalfeaturesandlongtermdependencieswithprimarysequencesbasedondeeplearning
AT zhangguanhong predictionofdnabindingproteinsusinglocalfeaturesandlongtermdependencieswithprimarysequencesbasedondeeplearning
AT wuzhize predictionofdnabindingproteinsusinglocalfeaturesandlongtermdependencieswithprimarysequencesbasedondeeplearning