Cargando…

On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach

DNA-binding proteins play pivotal roles in alternative splicing, RNA editing, methylating and many other biological functions for both eukaryotic and prokaryotic proteomes. Predicting the functions of these proteins from primary amino acids sequences is becoming one of the major challenges in functi...

Descripción completa

Detalles Bibliográficos
Autores principales: Qu, Yu-Hui, Yu, Hua, Gong, Xiu-Jun, Xu, Jia-Hui, Lee, Hong-Shun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5747425/
https://www.ncbi.nlm.nih.gov/pubmed/29287069
http://dx.doi.org/10.1371/journal.pone.0188129
_version_ 1783289272598528000
author Qu, Yu-Hui
Yu, Hua
Gong, Xiu-Jun
Xu, Jia-Hui
Lee, Hong-Shun
author_facet Qu, Yu-Hui
Yu, Hua
Gong, Xiu-Jun
Xu, Jia-Hui
Lee, Hong-Shun
author_sort Qu, Yu-Hui
collection PubMed
description DNA-binding proteins play pivotal roles in alternative splicing, RNA editing, methylating and many other biological functions for both eukaryotic and prokaryotic proteomes. Predicting the functions of these proteins from primary amino acids sequences is becoming one of the major challenges in functional annotations of genomes. Traditional prediction methods often devote themselves to extracting physiochemical features from sequences but ignoring motif information and location information between motifs. Meanwhile, the small scale of data volumes and large noises in training data result in lower accuracy and reliability of predictions. In this paper, we propose a deep learning based method to identify DNA-binding proteins from primary sequences alone. It utilizes two stages of convolutional neutral network to detect the function domains of protein sequences, and the long short-term memory neural network to identify their long term dependencies, an binary cross entropy to evaluate the quality of the neural networks. When the proposed method is tested with a realistic DNA binding protein dataset, it achieves a prediction accuracy of 94.2% at the Matthew’s correlation coefficient of 0.961. Compared with the LibSVM on the arabidopsis and yeast datasets via independent tests, the accuracy raises by 9% and 4% respectively. Comparative experiments using different feature extraction methods show that our model performs similar accuracy with the best of others, but its values of sensitivity, specificity and AUC increase by 27.83%, 1.31% and 16.21% respectively. Those results suggest that our method is a promising tool for identifying DNA-binding proteins.
format Online
Article
Text
id pubmed-5747425
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-57474252018-01-26 On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach Qu, Yu-Hui Yu, Hua Gong, Xiu-Jun Xu, Jia-Hui Lee, Hong-Shun PLoS One Research Article DNA-binding proteins play pivotal roles in alternative splicing, RNA editing, methylating and many other biological functions for both eukaryotic and prokaryotic proteomes. Predicting the functions of these proteins from primary amino acids sequences is becoming one of the major challenges in functional annotations of genomes. Traditional prediction methods often devote themselves to extracting physiochemical features from sequences but ignoring motif information and location information between motifs. Meanwhile, the small scale of data volumes and large noises in training data result in lower accuracy and reliability of predictions. In this paper, we propose a deep learning based method to identify DNA-binding proteins from primary sequences alone. It utilizes two stages of convolutional neutral network to detect the function domains of protein sequences, and the long short-term memory neural network to identify their long term dependencies, an binary cross entropy to evaluate the quality of the neural networks. When the proposed method is tested with a realistic DNA binding protein dataset, it achieves a prediction accuracy of 94.2% at the Matthew’s correlation coefficient of 0.961. Compared with the LibSVM on the arabidopsis and yeast datasets via independent tests, the accuracy raises by 9% and 4% respectively. Comparative experiments using different feature extraction methods show that our model performs similar accuracy with the best of others, but its values of sensitivity, specificity and AUC increase by 27.83%, 1.31% and 16.21% respectively. Those results suggest that our method is a promising tool for identifying DNA-binding proteins. Public Library of Science 2017-12-29 /pmc/articles/PMC5747425/ /pubmed/29287069 http://dx.doi.org/10.1371/journal.pone.0188129 Text en © 2017 Qu et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Qu, Yu-Hui
Yu, Hua
Gong, Xiu-Jun
Xu, Jia-Hui
Lee, Hong-Shun
On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach
title On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach
title_full On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach
title_fullStr On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach
title_full_unstemmed On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach
title_short On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach
title_sort on the prediction of dna-binding proteins only from primary sequences: a deep learning approach
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5747425/
https://www.ncbi.nlm.nih.gov/pubmed/29287069
http://dx.doi.org/10.1371/journal.pone.0188129
work_keys_str_mv AT quyuhui onthepredictionofdnabindingproteinsonlyfromprimarysequencesadeeplearningapproach
AT yuhua onthepredictionofdnabindingproteinsonlyfromprimarysequencesadeeplearningapproach
AT gongxiujun onthepredictionofdnabindingproteinsonlyfromprimarysequencesadeeplearningapproach
AT xujiahui onthepredictionofdnabindingproteinsonlyfromprimarysequencesadeeplearningapproach
AT leehongshun onthepredictionofdnabindingproteinsonlyfromprimarysequencesadeeplearningapproach