Cargando…

An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences

As the number of known proteins has expanded, how to accurately identify DNA binding proteins has become a significant biological challenge. At present, various computational methods have been proposed to recognize DNA-binding proteins from only amino acid sequences, such as SVM, DNABP and CNN-RNN....

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Siquan, Ma, Ruixiong, Wang, Haiou
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6855455/
https://www.ncbi.nlm.nih.gov/pubmed/31725778
http://dx.doi.org/10.1371/journal.pone.0225317
_version_ 1783470398157881344
author Hu, Siquan
Ma, Ruixiong
Wang, Haiou
author_facet Hu, Siquan
Ma, Ruixiong
Wang, Haiou
author_sort Hu, Siquan
collection PubMed
description As the number of known proteins has expanded, how to accurately identify DNA binding proteins has become a significant biological challenge. At present, various computational methods have been proposed to recognize DNA-binding proteins from only amino acid sequences, such as SVM, DNABP and CNN-RNN. However, these methods do not consider the context in amino acid sequences, which makes it difficult for them to adequately capture sequence features. In this study, a new method that coordinates a bidirectional long-term memory recurrent neural network and a convolutional neural network, called CNN-BiLSTM, is proposed to identify DNA binding proteins. The CNN-BiLSTM model can explore the potential contextual relationships of amino acid sequences and obtain more features than can traditional models. The experimental results show that the CNN-BiLSTM achieves a validation set prediction accuracy of 96.5%—7.8% higher than that of SVM, 9.6% higher than that of DNABP and 3.7% higher than that of CNN-RNN. After testing on 20,000 independent samples provided by UniProt that were not involved in model training, the accuracy of CNN-BiLSTM reached 94.5%—12% higher than that of SVM, 4.9% higher than that of DNABP and 4% higher than that of CNN-RNN. We visualized and compared the model training process of CNN-BiLSTM with that of CNN-RNN and found that the former is capable of better generalization from the training dataset, showing that CNN-BiLSTM has a wider range of adaptations to protein sequences. On the test set, CNN-BiLSTM has better credibility because its predicted scores are closer to the sample labels than are those of CNN-RNN. Therefore, the proposed CNN-BiLSTM is a more powerful method for identifying DNA-binding proteins.
format Online
Article
Text
id pubmed-6855455
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-68554552019-11-22 An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences Hu, Siquan Ma, Ruixiong Wang, Haiou PLoS One Research Article As the number of known proteins has expanded, how to accurately identify DNA binding proteins has become a significant biological challenge. At present, various computational methods have been proposed to recognize DNA-binding proteins from only amino acid sequences, such as SVM, DNABP and CNN-RNN. However, these methods do not consider the context in amino acid sequences, which makes it difficult for them to adequately capture sequence features. In this study, a new method that coordinates a bidirectional long-term memory recurrent neural network and a convolutional neural network, called CNN-BiLSTM, is proposed to identify DNA binding proteins. The CNN-BiLSTM model can explore the potential contextual relationships of amino acid sequences and obtain more features than can traditional models. The experimental results show that the CNN-BiLSTM achieves a validation set prediction accuracy of 96.5%—7.8% higher than that of SVM, 9.6% higher than that of DNABP and 3.7% higher than that of CNN-RNN. After testing on 20,000 independent samples provided by UniProt that were not involved in model training, the accuracy of CNN-BiLSTM reached 94.5%—12% higher than that of SVM, 4.9% higher than that of DNABP and 4% higher than that of CNN-RNN. We visualized and compared the model training process of CNN-BiLSTM with that of CNN-RNN and found that the former is capable of better generalization from the training dataset, showing that CNN-BiLSTM has a wider range of adaptations to protein sequences. On the test set, CNN-BiLSTM has better credibility because its predicted scores are closer to the sample labels than are those of CNN-RNN. Therefore, the proposed CNN-BiLSTM is a more powerful method for identifying DNA-binding proteins. Public Library of Science 2019-11-14 /pmc/articles/PMC6855455/ /pubmed/31725778 http://dx.doi.org/10.1371/journal.pone.0225317 Text en © 2019 Hu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Hu, Siquan
Ma, Ruixiong
Wang, Haiou
An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences
title An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences
title_full An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences
title_fullStr An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences
title_full_unstemmed An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences
title_short An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences
title_sort improved deep learning method for predicting dna-binding proteins based on contextual features in amino acid sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6855455/
https://www.ncbi.nlm.nih.gov/pubmed/31725778
http://dx.doi.org/10.1371/journal.pone.0225317
work_keys_str_mv AT husiquan animproveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences
AT maruixiong animproveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences
AT wanghaiou animproveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences
AT husiquan improveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences
AT maruixiong improveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences
AT wanghaiou improveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences