Cargando…

Prediction of fine-tuned promoter activity from DNA sequence

The quantitative prediction of transcriptional activity of genes using promoter sequence is fundamental to the engineering of biological systems for industrial purposes and understanding the natural variation in gene expression. To catalyze the development of new algorithms for this purpose, the Dia...

Descripción completa

Detalles Bibliográficos
Autores principales: Siwo, Geoffrey, Rider, Andrew, Tan, Asako, Pinapati, Richard, Emrich, Scott, Chawla, Nitesh, Ferdig, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000Research 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4916984/
https://www.ncbi.nlm.nih.gov/pubmed/27347373
http://dx.doi.org/10.12688/f1000research.7485.1
_version_ 1782438897675927552
author Siwo, Geoffrey
Rider, Andrew
Tan, Asako
Pinapati, Richard
Emrich, Scott
Chawla, Nitesh
Ferdig, Michael
author_facet Siwo, Geoffrey
Rider, Andrew
Tan, Asako
Pinapati, Richard
Emrich, Scott
Chawla, Nitesh
Ferdig, Michael
author_sort Siwo, Geoffrey
collection PubMed
description The quantitative prediction of transcriptional activity of genes using promoter sequence is fundamental to the engineering of biological systems for industrial purposes and understanding the natural variation in gene expression. To catalyze the development of new algorithms for this purpose, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) organized a community challenge seeking predictive models of promoter activity given normalized promoter activity data for 90 ribosomal protein promoters driving expression of a fluorescent reporter gene. By developing an unbiased modeling approach that performs an iterative search for predictive DNA sequence features using the frequencies of various k-mers, inferred DNA mechanical properties and spatial positions of promoter sequences, we achieved the best performer status in this challenge. The specific predictive features used in the model included the frequency of the nucleotide G, the length of polymeric tracts of T and TA, the frequencies of 6 distinct trinucleotides and 12 tetranucleotides, and the predicted protein deformability of the DNA sequence. Our method accurately predicted the activity of 20 natural variants of ribosomal protein promoters (Spearman correlation r = 0.73) as compared to 33 laboratory-mutated variants of the promoters (r = 0.57) in a test set that was hidden from participants. Notably, our model differed substantially from the rest in 2 main ways: i) it did not explicitly utilize transcription factor binding information implying that subtle DNA sequence features are highly associated with gene expression, and ii) it was entirely based on features extracted exclusively from the 100 bp region upstream from the translational start site demonstrating that this region encodes much of the overall promoter activity. The findings from this study have important implications for the engineering of predictable gene expression systems and the evolution of gene expression in naturally occurring biological systems.
format Online
Article
Text
id pubmed-4916984
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher F1000Research
record_format MEDLINE/PubMed
spelling pubmed-49169842016-06-23 Prediction of fine-tuned promoter activity from DNA sequence Siwo, Geoffrey Rider, Andrew Tan, Asako Pinapati, Richard Emrich, Scott Chawla, Nitesh Ferdig, Michael F1000Res Method Article The quantitative prediction of transcriptional activity of genes using promoter sequence is fundamental to the engineering of biological systems for industrial purposes and understanding the natural variation in gene expression. To catalyze the development of new algorithms for this purpose, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) organized a community challenge seeking predictive models of promoter activity given normalized promoter activity data for 90 ribosomal protein promoters driving expression of a fluorescent reporter gene. By developing an unbiased modeling approach that performs an iterative search for predictive DNA sequence features using the frequencies of various k-mers, inferred DNA mechanical properties and spatial positions of promoter sequences, we achieved the best performer status in this challenge. The specific predictive features used in the model included the frequency of the nucleotide G, the length of polymeric tracts of T and TA, the frequencies of 6 distinct trinucleotides and 12 tetranucleotides, and the predicted protein deformability of the DNA sequence. Our method accurately predicted the activity of 20 natural variants of ribosomal protein promoters (Spearman correlation r = 0.73) as compared to 33 laboratory-mutated variants of the promoters (r = 0.57) in a test set that was hidden from participants. Notably, our model differed substantially from the rest in 2 main ways: i) it did not explicitly utilize transcription factor binding information implying that subtle DNA sequence features are highly associated with gene expression, and ii) it was entirely based on features extracted exclusively from the 100 bp region upstream from the translational start site demonstrating that this region encodes much of the overall promoter activity. The findings from this study have important implications for the engineering of predictable gene expression systems and the evolution of gene expression in naturally occurring biological systems. F1000Research 2016-02-11 /pmc/articles/PMC4916984/ /pubmed/27347373 http://dx.doi.org/10.12688/f1000research.7485.1 Text en Copyright: © 2016 Siwo G et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Method Article
Siwo, Geoffrey
Rider, Andrew
Tan, Asako
Pinapati, Richard
Emrich, Scott
Chawla, Nitesh
Ferdig, Michael
Prediction of fine-tuned promoter activity from DNA sequence
title Prediction of fine-tuned promoter activity from DNA sequence
title_full Prediction of fine-tuned promoter activity from DNA sequence
title_fullStr Prediction of fine-tuned promoter activity from DNA sequence
title_full_unstemmed Prediction of fine-tuned promoter activity from DNA sequence
title_short Prediction of fine-tuned promoter activity from DNA sequence
title_sort prediction of fine-tuned promoter activity from dna sequence
topic Method Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4916984/
https://www.ncbi.nlm.nih.gov/pubmed/27347373
http://dx.doi.org/10.12688/f1000research.7485.1
work_keys_str_mv AT siwogeoffrey predictionoffinetunedpromoteractivityfromdnasequence
AT riderandrew predictionoffinetunedpromoteractivityfromdnasequence
AT tanasako predictionoffinetunedpromoteractivityfromdnasequence
AT pinapatirichard predictionoffinetunedpromoteractivityfromdnasequence
AT emrichscott predictionoffinetunedpromoteractivityfromdnasequence
AT chawlanitesh predictionoffinetunedpromoteractivityfromdnasequence
AT ferdigmichael predictionoffinetunedpromoteractivityfromdnasequence