Cargando…

PromoterPredict: sequence-based modelling of Escherichia coli σ(70) promoter strength yields logarithmic dependence between promoter strength and sequence

We present PromoterPredict, a dynamic multiple regression approach to predict the strength of Escherichia coli promoters binding the σ(70) factor of RNA polymerase. σ(70) promoters are ubiquitously used in recombinant DNA technology, but characterizing their strength is demanding in terms of both ti...

Descripción completa

Detalles Bibliográficos
Autores principales: Bharanikumar, Ramit, Premkumar, Keshav Aditya R., Palaniappan, Ashok
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6228582/
https://www.ncbi.nlm.nih.gov/pubmed/30425888
http://dx.doi.org/10.7717/peerj.5862
_version_ 1783370032069214208
author Bharanikumar, Ramit
Premkumar, Keshav Aditya R.
Palaniappan, Ashok
author_facet Bharanikumar, Ramit
Premkumar, Keshav Aditya R.
Palaniappan, Ashok
author_sort Bharanikumar, Ramit
collection PubMed
description We present PromoterPredict, a dynamic multiple regression approach to predict the strength of Escherichia coli promoters binding the σ(70) factor of RNA polymerase. σ(70) promoters are ubiquitously used in recombinant DNA technology, but characterizing their strength is demanding in terms of both time and money. We parsed a comprehensive database of bacterial promoters for the −35 and −10 hexamer regions of σ(70)-binding promoters and used these sequences to construct the respective position weight matrices (PWM). Next we used a well-characterized set of promoters to train a multivariate linear regression model and learn the mapping between PWM scores of the −35 and −10 hexamers and the promoter strength. We found that the log of the promoter strength is significantly linearly associated with a weighted sum of the −10 and −35 sequence profile scores. We applied our model to 100 sets of 100 randomly generated promoter sequences to generate a sampling distribution of mean strengths of random promoter sequences and obtained a mean of 6E-4 ± 1E-7. Our model was further validated by cross-validation and on independent datasets of characterized promoters. PromoterPredict accepts −10 and −35 hexamer sequences and returns the predicted promoter strength. It is capable of dynamic learning from user-supplied data to refine the model construction and yield more robust estimates of promoter strength. PromoterPredict is available as both a web service (https://promoterpredict.com) and standalone tool (https://github.com/PromoterPredict). Our work presents an intuitive generalization applicable to modelling the strength of other promoter classes.
format Online
Article
Text
id pubmed-6228582
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-62285822018-11-13 PromoterPredict: sequence-based modelling of Escherichia coli σ(70) promoter strength yields logarithmic dependence between promoter strength and sequence Bharanikumar, Ramit Premkumar, Keshav Aditya R. Palaniappan, Ashok PeerJ Bioinformatics We present PromoterPredict, a dynamic multiple regression approach to predict the strength of Escherichia coli promoters binding the σ(70) factor of RNA polymerase. σ(70) promoters are ubiquitously used in recombinant DNA technology, but characterizing their strength is demanding in terms of both time and money. We parsed a comprehensive database of bacterial promoters for the −35 and −10 hexamer regions of σ(70)-binding promoters and used these sequences to construct the respective position weight matrices (PWM). Next we used a well-characterized set of promoters to train a multivariate linear regression model and learn the mapping between PWM scores of the −35 and −10 hexamers and the promoter strength. We found that the log of the promoter strength is significantly linearly associated with a weighted sum of the −10 and −35 sequence profile scores. We applied our model to 100 sets of 100 randomly generated promoter sequences to generate a sampling distribution of mean strengths of random promoter sequences and obtained a mean of 6E-4 ± 1E-7. Our model was further validated by cross-validation and on independent datasets of characterized promoters. PromoterPredict accepts −10 and −35 hexamer sequences and returns the predicted promoter strength. It is capable of dynamic learning from user-supplied data to refine the model construction and yield more robust estimates of promoter strength. PromoterPredict is available as both a web service (https://promoterpredict.com) and standalone tool (https://github.com/PromoterPredict). Our work presents an intuitive generalization applicable to modelling the strength of other promoter classes. PeerJ Inc. 2018-11-07 /pmc/articles/PMC6228582/ /pubmed/30425888 http://dx.doi.org/10.7717/peerj.5862 Text en © 2018 Bharanikumar et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Bharanikumar, Ramit
Premkumar, Keshav Aditya R.
Palaniappan, Ashok
PromoterPredict: sequence-based modelling of Escherichia coli σ(70) promoter strength yields logarithmic dependence between promoter strength and sequence
title PromoterPredict: sequence-based modelling of Escherichia coli σ(70) promoter strength yields logarithmic dependence between promoter strength and sequence
title_full PromoterPredict: sequence-based modelling of Escherichia coli σ(70) promoter strength yields logarithmic dependence between promoter strength and sequence
title_fullStr PromoterPredict: sequence-based modelling of Escherichia coli σ(70) promoter strength yields logarithmic dependence between promoter strength and sequence
title_full_unstemmed PromoterPredict: sequence-based modelling of Escherichia coli σ(70) promoter strength yields logarithmic dependence between promoter strength and sequence
title_short PromoterPredict: sequence-based modelling of Escherichia coli σ(70) promoter strength yields logarithmic dependence between promoter strength and sequence
title_sort promoterpredict: sequence-based modelling of escherichia coli σ(70) promoter strength yields logarithmic dependence between promoter strength and sequence
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6228582/
https://www.ncbi.nlm.nih.gov/pubmed/30425888
http://dx.doi.org/10.7717/peerj.5862
work_keys_str_mv AT bharanikumarramit promoterpredictsequencebasedmodellingofescherichiacolis70promoterstrengthyieldslogarithmicdependencebetweenpromoterstrengthandsequence
AT premkumarkeshavadityar promoterpredictsequencebasedmodellingofescherichiacolis70promoterstrengthyieldslogarithmicdependencebetweenpromoterstrengthandsequence
AT palaniappanashok promoterpredictsequencebasedmodellingofescherichiacolis70promoterstrengthyieldslogarithmicdependencebetweenpromoterstrengthandsequence