Cargando…

Systematic Clustering of Transcription Start Site Landscapes

Genome-wide, high-throughput methods for transcription start site (TSS) detection have shown that most promoters have an array of neighboring TSSs where some are used more than others, forming a distribution of initiation propensities. TSS distributions (TSSDs) vary widely between promoters and earl...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Xiaobei, Valen, Eivind, Parker, Brian J., Sandelin, Albin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3160847/
https://www.ncbi.nlm.nih.gov/pubmed/21887249
http://dx.doi.org/10.1371/journal.pone.0023409
_version_ 1782210589192355840
author Zhao, Xiaobei
Valen, Eivind
Parker, Brian J.
Sandelin, Albin
author_facet Zhao, Xiaobei
Valen, Eivind
Parker, Brian J.
Sandelin, Albin
author_sort Zhao, Xiaobei
collection PubMed
description Genome-wide, high-throughput methods for transcription start site (TSS) detection have shown that most promoters have an array of neighboring TSSs where some are used more than others, forming a distribution of initiation propensities. TSS distributions (TSSDs) vary widely between promoters and earlier studies have shown that the TSSDs have biological implications in both regulation and function. However, no systematic study has been made to explore how many types of TSSDs and by extension core promoters exist and to understand which biological features distinguish them. In this study, we developed a new non-parametric dissimilarity measure and clustering approach to explore the similarities and stabilities of clusters of TSSDs. Previous studies have used arbitrary thresholds to arrive at two general classes: broad and sharp. We demonstrated that in addition to the previous broad/sharp dichotomy an additional category of promoters exists. Unlike typical TATA-driven sharp TSSDs where the TSS position can vary a few nucleotides, in this category virtually all TSSs originate from the same genomic position. These promoters lack epigenetic signatures of typical mRNA promoters and a substantial subset of them are mapping upstream of ribosomal protein pseudogenes. We present evidence that these are likely mapping errors, which have confounded earlier analyses, due to the high similarity of ribosomal gene promoters in combination with known G addition bias in the CAGE libraries. Thus, previous two-class separations of promoter based on TSS distributions are motivated, but the ultra-sharp TSS distributions will confound downstream analyses if not removed.
format Online
Article
Text
id pubmed-3160847
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-31608472011-09-01 Systematic Clustering of Transcription Start Site Landscapes Zhao, Xiaobei Valen, Eivind Parker, Brian J. Sandelin, Albin PLoS One Research Article Genome-wide, high-throughput methods for transcription start site (TSS) detection have shown that most promoters have an array of neighboring TSSs where some are used more than others, forming a distribution of initiation propensities. TSS distributions (TSSDs) vary widely between promoters and earlier studies have shown that the TSSDs have biological implications in both regulation and function. However, no systematic study has been made to explore how many types of TSSDs and by extension core promoters exist and to understand which biological features distinguish them. In this study, we developed a new non-parametric dissimilarity measure and clustering approach to explore the similarities and stabilities of clusters of TSSDs. Previous studies have used arbitrary thresholds to arrive at two general classes: broad and sharp. We demonstrated that in addition to the previous broad/sharp dichotomy an additional category of promoters exists. Unlike typical TATA-driven sharp TSSDs where the TSS position can vary a few nucleotides, in this category virtually all TSSs originate from the same genomic position. These promoters lack epigenetic signatures of typical mRNA promoters and a substantial subset of them are mapping upstream of ribosomal protein pseudogenes. We present evidence that these are likely mapping errors, which have confounded earlier analyses, due to the high similarity of ribosomal gene promoters in combination with known G addition bias in the CAGE libraries. Thus, previous two-class separations of promoter based on TSS distributions are motivated, but the ultra-sharp TSS distributions will confound downstream analyses if not removed. Public Library of Science 2011-08-24 /pmc/articles/PMC3160847/ /pubmed/21887249 http://dx.doi.org/10.1371/journal.pone.0023409 Text en Zhao et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Zhao, Xiaobei
Valen, Eivind
Parker, Brian J.
Sandelin, Albin
Systematic Clustering of Transcription Start Site Landscapes
title Systematic Clustering of Transcription Start Site Landscapes
title_full Systematic Clustering of Transcription Start Site Landscapes
title_fullStr Systematic Clustering of Transcription Start Site Landscapes
title_full_unstemmed Systematic Clustering of Transcription Start Site Landscapes
title_short Systematic Clustering of Transcription Start Site Landscapes
title_sort systematic clustering of transcription start site landscapes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3160847/
https://www.ncbi.nlm.nih.gov/pubmed/21887249
http://dx.doi.org/10.1371/journal.pone.0023409
work_keys_str_mv AT zhaoxiaobei systematicclusteringoftranscriptionstartsitelandscapes
AT valeneivind systematicclusteringoftranscriptionstartsitelandscapes
AT parkerbrianj systematicclusteringoftranscriptionstartsitelandscapes
AT sandelinalbin systematicclusteringoftranscriptionstartsitelandscapes