Cargando…

Integration of high-resolution promoter profiling assays reveals novel, cell type–specific transcription start sites across 115 human cell and tissue types

Accurate transcription start site (TSS) annotations are essential for understanding transcriptional regulation and its role in human disease. Gene collections such as GENCODE contain annotations for tens of thousands of TSSs, but not all of these annotations are experimentally validated nor do they...

Descripción completa

Detalles Bibliográficos
Autores principales: Moore, Jill E., Zhang, Xiao-Ou, Elhajjajy, Shaimae I., Fan, Kaili, Pratt, Henry E., Reese, Fairlie, Mortazavi, Ali, Weng, Zhiping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8805725/
https://www.ncbi.nlm.nih.gov/pubmed/34949670
http://dx.doi.org/10.1101/gr.275723.121
_version_ 1784643288514428928
author Moore, Jill E.
Zhang, Xiao-Ou
Elhajjajy, Shaimae I.
Fan, Kaili
Pratt, Henry E.
Reese, Fairlie
Mortazavi, Ali
Weng, Zhiping
author_facet Moore, Jill E.
Zhang, Xiao-Ou
Elhajjajy, Shaimae I.
Fan, Kaili
Pratt, Henry E.
Reese, Fairlie
Mortazavi, Ali
Weng, Zhiping
author_sort Moore, Jill E.
collection PubMed
description Accurate transcription start site (TSS) annotations are essential for understanding transcriptional regulation and its role in human disease. Gene collections such as GENCODE contain annotations for tens of thousands of TSSs, but not all of these annotations are experimentally validated nor do they contain information on cell type–specific usage. Therefore, we sought to generate a collection of experimentally validated TSSs by integrating RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression (RAMPAGE) data from 115 cell and tissue types, which resulted in a collection of approximately 50 thousand representative RAMPAGE peaks. These peaks are primarily proximal to GENCODE-annotated TSSs and are concordant with other transcription assays. Because RAMPAGE uses paired-end reads, we were then able to connect peaks to transcripts by analyzing the genomic positions of the 3′ ends of read mates. Using this paired-end information, we classified the vast majority (37 thousand) of our RAMPAGE peaks as verified TSSs, updating TSS annotations for 20% of GENCODE genes. We also found that these updated TSS annotations are supported by epigenomic and other transcriptomic data sets. To show the utility of this RAMPAGE rPeak collection, we intersected it with the NHGRI/EBI genome-wide association study (GWAS) catalog and identified new candidate GWAS genes. Overall, our work shows the importance of integrating experimental data to further refine TSS annotations and provides a valuable resource for the biological community.
format Online
Article
Text
id pubmed-8805725
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-88057252022-08-01 Integration of high-resolution promoter profiling assays reveals novel, cell type–specific transcription start sites across 115 human cell and tissue types Moore, Jill E. Zhang, Xiao-Ou Elhajjajy, Shaimae I. Fan, Kaili Pratt, Henry E. Reese, Fairlie Mortazavi, Ali Weng, Zhiping Genome Res Resource Accurate transcription start site (TSS) annotations are essential for understanding transcriptional regulation and its role in human disease. Gene collections such as GENCODE contain annotations for tens of thousands of TSSs, but not all of these annotations are experimentally validated nor do they contain information on cell type–specific usage. Therefore, we sought to generate a collection of experimentally validated TSSs by integrating RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression (RAMPAGE) data from 115 cell and tissue types, which resulted in a collection of approximately 50 thousand representative RAMPAGE peaks. These peaks are primarily proximal to GENCODE-annotated TSSs and are concordant with other transcription assays. Because RAMPAGE uses paired-end reads, we were then able to connect peaks to transcripts by analyzing the genomic positions of the 3′ ends of read mates. Using this paired-end information, we classified the vast majority (37 thousand) of our RAMPAGE peaks as verified TSSs, updating TSS annotations for 20% of GENCODE genes. We also found that these updated TSS annotations are supported by epigenomic and other transcriptomic data sets. To show the utility of this RAMPAGE rPeak collection, we intersected it with the NHGRI/EBI genome-wide association study (GWAS) catalog and identified new candidate GWAS genes. Overall, our work shows the importance of integrating experimental data to further refine TSS annotations and provides a valuable resource for the biological community. Cold Spring Harbor Laboratory Press 2022-02 /pmc/articles/PMC8805725/ /pubmed/34949670 http://dx.doi.org/10.1101/gr.275723.121 Text en © 2022 Moore et al.; Published by Cold Spring Harbor Laboratory Press https://creativecommons.org/licenses/by-nc/4.0/This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see https://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Resource
Moore, Jill E.
Zhang, Xiao-Ou
Elhajjajy, Shaimae I.
Fan, Kaili
Pratt, Henry E.
Reese, Fairlie
Mortazavi, Ali
Weng, Zhiping
Integration of high-resolution promoter profiling assays reveals novel, cell type–specific transcription start sites across 115 human cell and tissue types
title Integration of high-resolution promoter profiling assays reveals novel, cell type–specific transcription start sites across 115 human cell and tissue types
title_full Integration of high-resolution promoter profiling assays reveals novel, cell type–specific transcription start sites across 115 human cell and tissue types
title_fullStr Integration of high-resolution promoter profiling assays reveals novel, cell type–specific transcription start sites across 115 human cell and tissue types
title_full_unstemmed Integration of high-resolution promoter profiling assays reveals novel, cell type–specific transcription start sites across 115 human cell and tissue types
title_short Integration of high-resolution promoter profiling assays reveals novel, cell type–specific transcription start sites across 115 human cell and tissue types
title_sort integration of high-resolution promoter profiling assays reveals novel, cell type–specific transcription start sites across 115 human cell and tissue types
topic Resource
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8805725/
https://www.ncbi.nlm.nih.gov/pubmed/34949670
http://dx.doi.org/10.1101/gr.275723.121
work_keys_str_mv AT moorejille integrationofhighresolutionpromoterprofilingassaysrevealsnovelcelltypespecifictranscriptionstartsitesacross115humancellandtissuetypes
AT zhangxiaoou integrationofhighresolutionpromoterprofilingassaysrevealsnovelcelltypespecifictranscriptionstartsitesacross115humancellandtissuetypes
AT elhajjajyshaimaei integrationofhighresolutionpromoterprofilingassaysrevealsnovelcelltypespecifictranscriptionstartsitesacross115humancellandtissuetypes
AT fankaili integrationofhighresolutionpromoterprofilingassaysrevealsnovelcelltypespecifictranscriptionstartsitesacross115humancellandtissuetypes
AT pratthenrye integrationofhighresolutionpromoterprofilingassaysrevealsnovelcelltypespecifictranscriptionstartsitesacross115humancellandtissuetypes
AT reesefairlie integrationofhighresolutionpromoterprofilingassaysrevealsnovelcelltypespecifictranscriptionstartsitesacross115humancellandtissuetypes
AT mortazaviali integrationofhighresolutionpromoterprofilingassaysrevealsnovelcelltypespecifictranscriptionstartsitesacross115humancellandtissuetypes
AT wengzhiping integrationofhighresolutionpromoterprofilingassaysrevealsnovelcelltypespecifictranscriptionstartsitesacross115humancellandtissuetypes