Cargando…

Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks

Accurate computational identification of promoters remains a challenge as these key DNA regulatory regions have variable structures composed of functional motifs that provide gene-specific initiation of transcription. In this paper we utilize Convolutional Neural Networks (CNN) to analyze sequence c...

Descripción completa

Detalles Bibliográficos
Autores principales: Umarov, Ramzan Kh., Solovyev, Victor V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5291440/
https://www.ncbi.nlm.nih.gov/pubmed/28158264
http://dx.doi.org/10.1371/journal.pone.0171410
_version_ 1782504779880071168
author Umarov, Ramzan Kh.
Solovyev, Victor V.
author_facet Umarov, Ramzan Kh.
Solovyev, Victor V.
author_sort Umarov, Ramzan Kh.
collection PubMed
description Accurate computational identification of promoters remains a challenge as these key DNA regulatory regions have variable structures composed of functional motifs that provide gene-specific initiation of transcription. In this paper we utilize Convolutional Neural Networks (CNN) to analyze sequence characteristics of prokaryotic and eukaryotic promoters and build their predictive models. We trained a similar CNN architecture on promoters of five distant organisms: human, mouse, plant (Arabidopsis), and two bacteria (Escherichia coli and Bacillus subtilis). We found that CNN trained on sigma70 subclass of Escherichia coli promoter gives an excellent classification of promoters and non-promoter sequences (Sn = 0.90, Sp = 0.96, CC = 0.84). The Bacillus subtilis promoters identification CNN model achieves Sn = 0.91, Sp = 0.95, and CC = 0.86. For human, mouse and Arabidopsis promoters we employed CNNs for identification of two well-known promoter classes (TATA and non-TATA promoters). CNN models nicely recognize these complex functional regions. For human promoters Sn/Sp/CC accuracy of prediction reached 0.95/0.98/0,90 on TATA and 0.90/0.98/0.89 for non-TATA promoter sequences, respectively. For Arabidopsis we observed Sn/Sp/CC 0.95/0.97/0.91 (TATA) and 0.94/0.94/0.86 (non-TATA) promoters. Thus, the developed CNN models, implemented in CNNProm program, demonstrated the ability of deep learning approach to grasp complex promoter sequence characteristics and achieve significantly higher accuracy compared to the previously developed promoter prediction programs. We also propose random substitution procedure to discover positionally conserved promoter functional elements. As the suggested approach does not require knowledge of any specific promoter features, it can be easily extended to identify promoters and other complex functional regions in sequences of many other and especially newly sequenced genomes. The CNNProm program is available to run at web server http://www.softberry.com.
format Online
Article
Text
id pubmed-5291440
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-52914402017-02-17 Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks Umarov, Ramzan Kh. Solovyev, Victor V. PLoS One Research Article Accurate computational identification of promoters remains a challenge as these key DNA regulatory regions have variable structures composed of functional motifs that provide gene-specific initiation of transcription. In this paper we utilize Convolutional Neural Networks (CNN) to analyze sequence characteristics of prokaryotic and eukaryotic promoters and build their predictive models. We trained a similar CNN architecture on promoters of five distant organisms: human, mouse, plant (Arabidopsis), and two bacteria (Escherichia coli and Bacillus subtilis). We found that CNN trained on sigma70 subclass of Escherichia coli promoter gives an excellent classification of promoters and non-promoter sequences (Sn = 0.90, Sp = 0.96, CC = 0.84). The Bacillus subtilis promoters identification CNN model achieves Sn = 0.91, Sp = 0.95, and CC = 0.86. For human, mouse and Arabidopsis promoters we employed CNNs for identification of two well-known promoter classes (TATA and non-TATA promoters). CNN models nicely recognize these complex functional regions. For human promoters Sn/Sp/CC accuracy of prediction reached 0.95/0.98/0,90 on TATA and 0.90/0.98/0.89 for non-TATA promoter sequences, respectively. For Arabidopsis we observed Sn/Sp/CC 0.95/0.97/0.91 (TATA) and 0.94/0.94/0.86 (non-TATA) promoters. Thus, the developed CNN models, implemented in CNNProm program, demonstrated the ability of deep learning approach to grasp complex promoter sequence characteristics and achieve significantly higher accuracy compared to the previously developed promoter prediction programs. We also propose random substitution procedure to discover positionally conserved promoter functional elements. As the suggested approach does not require knowledge of any specific promoter features, it can be easily extended to identify promoters and other complex functional regions in sequences of many other and especially newly sequenced genomes. The CNNProm program is available to run at web server http://www.softberry.com. Public Library of Science 2017-02-03 /pmc/articles/PMC5291440/ /pubmed/28158264 http://dx.doi.org/10.1371/journal.pone.0171410 Text en © 2017 Umarov, Solovyev http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Umarov, Ramzan Kh.
Solovyev, Victor V.
Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks
title Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks
title_full Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks
title_fullStr Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks
title_full_unstemmed Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks
title_short Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks
title_sort recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5291440/
https://www.ncbi.nlm.nih.gov/pubmed/28158264
http://dx.doi.org/10.1371/journal.pone.0171410
work_keys_str_mv AT umarovramzankh recognitionofprokaryoticandeukaryoticpromotersusingconvolutionaldeeplearningneuralnetworks
AT solovyevvictorv recognitionofprokaryoticandeukaryoticpromotersusingconvolutionaldeeplearningneuralnetworks