Cargando…
On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach
DNA-binding proteins play pivotal roles in alternative splicing, RNA editing, methylating and many other biological functions for both eukaryotic and prokaryotic proteomes. Predicting the functions of these proteins from primary amino acids sequences is becoming one of the major challenges in functi...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5747425/ https://www.ncbi.nlm.nih.gov/pubmed/29287069 http://dx.doi.org/10.1371/journal.pone.0188129 |
_version_ | 1783289272598528000 |
---|---|
author | Qu, Yu-Hui Yu, Hua Gong, Xiu-Jun Xu, Jia-Hui Lee, Hong-Shun |
author_facet | Qu, Yu-Hui Yu, Hua Gong, Xiu-Jun Xu, Jia-Hui Lee, Hong-Shun |
author_sort | Qu, Yu-Hui |
collection | PubMed |
description | DNA-binding proteins play pivotal roles in alternative splicing, RNA editing, methylating and many other biological functions for both eukaryotic and prokaryotic proteomes. Predicting the functions of these proteins from primary amino acids sequences is becoming one of the major challenges in functional annotations of genomes. Traditional prediction methods often devote themselves to extracting physiochemical features from sequences but ignoring motif information and location information between motifs. Meanwhile, the small scale of data volumes and large noises in training data result in lower accuracy and reliability of predictions. In this paper, we propose a deep learning based method to identify DNA-binding proteins from primary sequences alone. It utilizes two stages of convolutional neutral network to detect the function domains of protein sequences, and the long short-term memory neural network to identify their long term dependencies, an binary cross entropy to evaluate the quality of the neural networks. When the proposed method is tested with a realistic DNA binding protein dataset, it achieves a prediction accuracy of 94.2% at the Matthew’s correlation coefficient of 0.961. Compared with the LibSVM on the arabidopsis and yeast datasets via independent tests, the accuracy raises by 9% and 4% respectively. Comparative experiments using different feature extraction methods show that our model performs similar accuracy with the best of others, but its values of sensitivity, specificity and AUC increase by 27.83%, 1.31% and 16.21% respectively. Those results suggest that our method is a promising tool for identifying DNA-binding proteins. |
format | Online Article Text |
id | pubmed-5747425 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-57474252018-01-26 On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach Qu, Yu-Hui Yu, Hua Gong, Xiu-Jun Xu, Jia-Hui Lee, Hong-Shun PLoS One Research Article DNA-binding proteins play pivotal roles in alternative splicing, RNA editing, methylating and many other biological functions for both eukaryotic and prokaryotic proteomes. Predicting the functions of these proteins from primary amino acids sequences is becoming one of the major challenges in functional annotations of genomes. Traditional prediction methods often devote themselves to extracting physiochemical features from sequences but ignoring motif information and location information between motifs. Meanwhile, the small scale of data volumes and large noises in training data result in lower accuracy and reliability of predictions. In this paper, we propose a deep learning based method to identify DNA-binding proteins from primary sequences alone. It utilizes two stages of convolutional neutral network to detect the function domains of protein sequences, and the long short-term memory neural network to identify their long term dependencies, an binary cross entropy to evaluate the quality of the neural networks. When the proposed method is tested with a realistic DNA binding protein dataset, it achieves a prediction accuracy of 94.2% at the Matthew’s correlation coefficient of 0.961. Compared with the LibSVM on the arabidopsis and yeast datasets via independent tests, the accuracy raises by 9% and 4% respectively. Comparative experiments using different feature extraction methods show that our model performs similar accuracy with the best of others, but its values of sensitivity, specificity and AUC increase by 27.83%, 1.31% and 16.21% respectively. Those results suggest that our method is a promising tool for identifying DNA-binding proteins. Public Library of Science 2017-12-29 /pmc/articles/PMC5747425/ /pubmed/29287069 http://dx.doi.org/10.1371/journal.pone.0188129 Text en © 2017 Qu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Qu, Yu-Hui Yu, Hua Gong, Xiu-Jun Xu, Jia-Hui Lee, Hong-Shun On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach |
title | On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach |
title_full | On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach |
title_fullStr | On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach |
title_full_unstemmed | On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach |
title_short | On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach |
title_sort | on the prediction of dna-binding proteins only from primary sequences: a deep learning approach |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5747425/ https://www.ncbi.nlm.nih.gov/pubmed/29287069 http://dx.doi.org/10.1371/journal.pone.0188129 |
work_keys_str_mv | AT quyuhui onthepredictionofdnabindingproteinsonlyfromprimarysequencesadeeplearningapproach AT yuhua onthepredictionofdnabindingproteinsonlyfromprimarysequencesadeeplearningapproach AT gongxiujun onthepredictionofdnabindingproteinsonlyfromprimarysequencesadeeplearningapproach AT xujiahui onthepredictionofdnabindingproteinsonlyfromprimarysequencesadeeplearningapproach AT leehongshun onthepredictionofdnabindingproteinsonlyfromprimarysequencesadeeplearningapproach |