Cargando…
Nucleotide patterns aiding in prediction of eukaryotic promoters
Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In t...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5687710/ https://www.ncbi.nlm.nih.gov/pubmed/29141011 http://dx.doi.org/10.1371/journal.pone.0187243 |
_version_ | 1783279015124008960 |
---|---|
author | Triska, Martin Solovyev, Victor Baranova, Ancha Kel, Alexander Tatarinova, Tatiana V. |
author_facet | Triska, Martin Solovyev, Victor Baranova, Ancha Kel, Alexander Tatarinova, Tatiana V. |
author_sort | Triska, Martin |
collection | PubMed |
description | Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In this paper, we present a comprehensive analysis of genomic features associated with promoters and show that probabilistic integrative algorithms-driven models allow accurate classification of DNA sequence into “promoters” and “non-promoters” even in absence of the full-length cDNA sequences. These models may be built upon the maps of the distributions of sequence polymorphisms, RNA sequencing reads on genomic DNA, methylated nucleotides, transcription factor binding sites, as well as relative frequencies of nucleotides and their combinations. Positional clustering of binding sites shows that the cells of Oryza sativa utilize three distinct classes of transcription factors: those that bind preferentially to the [-500,0] region (188 “promoter-specific” transcription factors), those that bind preferentially to the [0,500] region (282 “5′ UTR-specific” TFs), and 207 of the “promiscuous” transcription factors with little or no location preference with respect to TSS. For the most informative motifs, their positional preferences are conserved between dicots and monocots. |
format | Online Article Text |
id | pubmed-5687710 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-56877102017-11-30 Nucleotide patterns aiding in prediction of eukaryotic promoters Triska, Martin Solovyev, Victor Baranova, Ancha Kel, Alexander Tatarinova, Tatiana V. PLoS One Research Article Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In this paper, we present a comprehensive analysis of genomic features associated with promoters and show that probabilistic integrative algorithms-driven models allow accurate classification of DNA sequence into “promoters” and “non-promoters” even in absence of the full-length cDNA sequences. These models may be built upon the maps of the distributions of sequence polymorphisms, RNA sequencing reads on genomic DNA, methylated nucleotides, transcription factor binding sites, as well as relative frequencies of nucleotides and their combinations. Positional clustering of binding sites shows that the cells of Oryza sativa utilize three distinct classes of transcription factors: those that bind preferentially to the [-500,0] region (188 “promoter-specific” transcription factors), those that bind preferentially to the [0,500] region (282 “5′ UTR-specific” TFs), and 207 of the “promiscuous” transcription factors with little or no location preference with respect to TSS. For the most informative motifs, their positional preferences are conserved between dicots and monocots. Public Library of Science 2017-11-15 /pmc/articles/PMC5687710/ /pubmed/29141011 http://dx.doi.org/10.1371/journal.pone.0187243 Text en © 2017 Triska et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Triska, Martin Solovyev, Victor Baranova, Ancha Kel, Alexander Tatarinova, Tatiana V. Nucleotide patterns aiding in prediction of eukaryotic promoters |
title | Nucleotide patterns aiding in prediction of eukaryotic promoters |
title_full | Nucleotide patterns aiding in prediction of eukaryotic promoters |
title_fullStr | Nucleotide patterns aiding in prediction of eukaryotic promoters |
title_full_unstemmed | Nucleotide patterns aiding in prediction of eukaryotic promoters |
title_short | Nucleotide patterns aiding in prediction of eukaryotic promoters |
title_sort | nucleotide patterns aiding in prediction of eukaryotic promoters |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5687710/ https://www.ncbi.nlm.nih.gov/pubmed/29141011 http://dx.doi.org/10.1371/journal.pone.0187243 |
work_keys_str_mv | AT triskamartin nucleotidepatternsaidinginpredictionofeukaryoticpromoters AT solovyevvictor nucleotidepatternsaidinginpredictionofeukaryoticpromoters AT baranovaancha nucleotidepatternsaidinginpredictionofeukaryoticpromoters AT kelalexander nucleotidepatternsaidinginpredictionofeukaryoticpromoters AT tatarinovatatianav nucleotidepatternsaidinginpredictionofeukaryoticpromoters |