Cargando…

Nucleotide patterns aiding in prediction of eukaryotic promoters

Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In t...

Descripción completa

Detalles Bibliográficos
Autores principales: Triska, Martin, Solovyev, Victor, Baranova, Ancha, Kel, Alexander, Tatarinova, Tatiana V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5687710/
https://www.ncbi.nlm.nih.gov/pubmed/29141011
http://dx.doi.org/10.1371/journal.pone.0187243
_version_ 1783279015124008960
author Triska, Martin
Solovyev, Victor
Baranova, Ancha
Kel, Alexander
Tatarinova, Tatiana V.
author_facet Triska, Martin
Solovyev, Victor
Baranova, Ancha
Kel, Alexander
Tatarinova, Tatiana V.
author_sort Triska, Martin
collection PubMed
description Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In this paper, we present a comprehensive analysis of genomic features associated with promoters and show that probabilistic integrative algorithms-driven models allow accurate classification of DNA sequence into “promoters” and “non-promoters” even in absence of the full-length cDNA sequences. These models may be built upon the maps of the distributions of sequence polymorphisms, RNA sequencing reads on genomic DNA, methylated nucleotides, transcription factor binding sites, as well as relative frequencies of nucleotides and their combinations. Positional clustering of binding sites shows that the cells of Oryza sativa utilize three distinct classes of transcription factors: those that bind preferentially to the [-500,0] region (188 “promoter-specific” transcription factors), those that bind preferentially to the [0,500] region (282 “5′ UTR-specific” TFs), and 207 of the “promiscuous” transcription factors with little or no location preference with respect to TSS. For the most informative motifs, their positional preferences are conserved between dicots and monocots.
format Online
Article
Text
id pubmed-5687710
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-56877102017-11-30 Nucleotide patterns aiding in prediction of eukaryotic promoters Triska, Martin Solovyev, Victor Baranova, Ancha Kel, Alexander Tatarinova, Tatiana V. PLoS One Research Article Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In this paper, we present a comprehensive analysis of genomic features associated with promoters and show that probabilistic integrative algorithms-driven models allow accurate classification of DNA sequence into “promoters” and “non-promoters” even in absence of the full-length cDNA sequences. These models may be built upon the maps of the distributions of sequence polymorphisms, RNA sequencing reads on genomic DNA, methylated nucleotides, transcription factor binding sites, as well as relative frequencies of nucleotides and their combinations. Positional clustering of binding sites shows that the cells of Oryza sativa utilize three distinct classes of transcription factors: those that bind preferentially to the [-500,0] region (188 “promoter-specific” transcription factors), those that bind preferentially to the [0,500] region (282 “5′ UTR-specific” TFs), and 207 of the “promiscuous” transcription factors with little or no location preference with respect to TSS. For the most informative motifs, their positional preferences are conserved between dicots and monocots. Public Library of Science 2017-11-15 /pmc/articles/PMC5687710/ /pubmed/29141011 http://dx.doi.org/10.1371/journal.pone.0187243 Text en © 2017 Triska et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Triska, Martin
Solovyev, Victor
Baranova, Ancha
Kel, Alexander
Tatarinova, Tatiana V.
Nucleotide patterns aiding in prediction of eukaryotic promoters
title Nucleotide patterns aiding in prediction of eukaryotic promoters
title_full Nucleotide patterns aiding in prediction of eukaryotic promoters
title_fullStr Nucleotide patterns aiding in prediction of eukaryotic promoters
title_full_unstemmed Nucleotide patterns aiding in prediction of eukaryotic promoters
title_short Nucleotide patterns aiding in prediction of eukaryotic promoters
title_sort nucleotide patterns aiding in prediction of eukaryotic promoters
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5687710/
https://www.ncbi.nlm.nih.gov/pubmed/29141011
http://dx.doi.org/10.1371/journal.pone.0187243
work_keys_str_mv AT triskamartin nucleotidepatternsaidinginpredictionofeukaryoticpromoters
AT solovyevvictor nucleotidepatternsaidinginpredictionofeukaryoticpromoters
AT baranovaancha nucleotidepatternsaidinginpredictionofeukaryoticpromoters
AT kelalexander nucleotidepatternsaidinginpredictionofeukaryoticpromoters
AT tatarinovatatianav nucleotidepatternsaidinginpredictionofeukaryoticpromoters