Cargando…
An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences
As the number of known proteins has expanded, how to accurately identify DNA binding proteins has become a significant biological challenge. At present, various computational methods have been proposed to recognize DNA-binding proteins from only amino acid sequences, such as SVM, DNABP and CNN-RNN....
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6855455/ https://www.ncbi.nlm.nih.gov/pubmed/31725778 http://dx.doi.org/10.1371/journal.pone.0225317 |
_version_ | 1783470398157881344 |
---|---|
author | Hu, Siquan Ma, Ruixiong Wang, Haiou |
author_facet | Hu, Siquan Ma, Ruixiong Wang, Haiou |
author_sort | Hu, Siquan |
collection | PubMed |
description | As the number of known proteins has expanded, how to accurately identify DNA binding proteins has become a significant biological challenge. At present, various computational methods have been proposed to recognize DNA-binding proteins from only amino acid sequences, such as SVM, DNABP and CNN-RNN. However, these methods do not consider the context in amino acid sequences, which makes it difficult for them to adequately capture sequence features. In this study, a new method that coordinates a bidirectional long-term memory recurrent neural network and a convolutional neural network, called CNN-BiLSTM, is proposed to identify DNA binding proteins. The CNN-BiLSTM model can explore the potential contextual relationships of amino acid sequences and obtain more features than can traditional models. The experimental results show that the CNN-BiLSTM achieves a validation set prediction accuracy of 96.5%—7.8% higher than that of SVM, 9.6% higher than that of DNABP and 3.7% higher than that of CNN-RNN. After testing on 20,000 independent samples provided by UniProt that were not involved in model training, the accuracy of CNN-BiLSTM reached 94.5%—12% higher than that of SVM, 4.9% higher than that of DNABP and 4% higher than that of CNN-RNN. We visualized and compared the model training process of CNN-BiLSTM with that of CNN-RNN and found that the former is capable of better generalization from the training dataset, showing that CNN-BiLSTM has a wider range of adaptations to protein sequences. On the test set, CNN-BiLSTM has better credibility because its predicted scores are closer to the sample labels than are those of CNN-RNN. Therefore, the proposed CNN-BiLSTM is a more powerful method for identifying DNA-binding proteins. |
format | Online Article Text |
id | pubmed-6855455 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-68554552019-11-22 An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences Hu, Siquan Ma, Ruixiong Wang, Haiou PLoS One Research Article As the number of known proteins has expanded, how to accurately identify DNA binding proteins has become a significant biological challenge. At present, various computational methods have been proposed to recognize DNA-binding proteins from only amino acid sequences, such as SVM, DNABP and CNN-RNN. However, these methods do not consider the context in amino acid sequences, which makes it difficult for them to adequately capture sequence features. In this study, a new method that coordinates a bidirectional long-term memory recurrent neural network and a convolutional neural network, called CNN-BiLSTM, is proposed to identify DNA binding proteins. The CNN-BiLSTM model can explore the potential contextual relationships of amino acid sequences and obtain more features than can traditional models. The experimental results show that the CNN-BiLSTM achieves a validation set prediction accuracy of 96.5%—7.8% higher than that of SVM, 9.6% higher than that of DNABP and 3.7% higher than that of CNN-RNN. After testing on 20,000 independent samples provided by UniProt that were not involved in model training, the accuracy of CNN-BiLSTM reached 94.5%—12% higher than that of SVM, 4.9% higher than that of DNABP and 4% higher than that of CNN-RNN. We visualized and compared the model training process of CNN-BiLSTM with that of CNN-RNN and found that the former is capable of better generalization from the training dataset, showing that CNN-BiLSTM has a wider range of adaptations to protein sequences. On the test set, CNN-BiLSTM has better credibility because its predicted scores are closer to the sample labels than are those of CNN-RNN. Therefore, the proposed CNN-BiLSTM is a more powerful method for identifying DNA-binding proteins. Public Library of Science 2019-11-14 /pmc/articles/PMC6855455/ /pubmed/31725778 http://dx.doi.org/10.1371/journal.pone.0225317 Text en © 2019 Hu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Hu, Siquan Ma, Ruixiong Wang, Haiou An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences |
title | An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences |
title_full | An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences |
title_fullStr | An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences |
title_full_unstemmed | An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences |
title_short | An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences |
title_sort | improved deep learning method for predicting dna-binding proteins based on contextual features in amino acid sequences |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6855455/ https://www.ncbi.nlm.nih.gov/pubmed/31725778 http://dx.doi.org/10.1371/journal.pone.0225317 |
work_keys_str_mv | AT husiquan animproveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences AT maruixiong animproveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences AT wanghaiou animproveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences AT husiquan improveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences AT maruixiong improveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences AT wanghaiou improveddeeplearningmethodforpredictingdnabindingproteinsbasedoncontextualfeaturesinaminoacidsequences |