Cargando…

Most “Dark Matter” Transcripts Are Associated With Known Genes

A series of reports over the last few years have indicated that a much larger portion of the mammalian genome is transcribed than can be accounted for by currently annotated genes, but the quantity and nature of these additional transcripts remains unclear. Here, we have used data from single- and p...

Descripción completa

Detalles Bibliográficos
Autores principales: van Bakel, Harm, Nislow, Corey, Blencowe, Benjamin J., Hughes, Timothy R.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2872640/
https://www.ncbi.nlm.nih.gov/pubmed/20502517
http://dx.doi.org/10.1371/journal.pbio.1000371
_version_ 1782181242432651264
author van Bakel, Harm
Nislow, Corey
Blencowe, Benjamin J.
Hughes, Timothy R.
author_facet van Bakel, Harm
Nislow, Corey
Blencowe, Benjamin J.
Hughes, Timothy R.
author_sort van Bakel, Harm
collection PubMed
description A series of reports over the last few years have indicated that a much larger portion of the mammalian genome is transcribed than can be accounted for by currently annotated genes, but the quantity and nature of these additional transcripts remains unclear. Here, we have used data from single- and paired-end RNA-Seq and tiling arrays to assess the quantity and composition of transcripts in PolyA+ RNA from human and mouse tissues. Relative to tiling arrays, RNA-Seq identifies many fewer transcribed regions (“seqfrags”) outside known exons and ncRNAs. Most nonexonic seqfrags are in introns, raising the possibility that they are fragments of pre-mRNAs. The chromosomal locations of the majority of intergenic seqfrags in RNA-Seq data are near known genes, consistent with alternative cleavage and polyadenylation site usage, promoter- and terminator-associated transcripts, or new alternative exons; indeed, reads that bridge splice sites identified 4,544 new exons, affecting 3,554 genes. Most of the remaining seqfrags correspond to either single reads that display characteristics of random sampling from a low-level background or several thousand small transcripts (median length = 111 bp) present at higher levels, which also tend to display sequence conservation and originate from regions with open chromatin. We conclude that, while there are bona fide new intergenic transcripts, their number and abundance is generally low in comparison to known exons, and the genome is not as pervasively transcribed as previously reported.
format Text
id pubmed-2872640
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-28726402010-05-25 Most “Dark Matter” Transcripts Are Associated With Known Genes van Bakel, Harm Nislow, Corey Blencowe, Benjamin J. Hughes, Timothy R. PLoS Biol Research Article A series of reports over the last few years have indicated that a much larger portion of the mammalian genome is transcribed than can be accounted for by currently annotated genes, but the quantity and nature of these additional transcripts remains unclear. Here, we have used data from single- and paired-end RNA-Seq and tiling arrays to assess the quantity and composition of transcripts in PolyA+ RNA from human and mouse tissues. Relative to tiling arrays, RNA-Seq identifies many fewer transcribed regions (“seqfrags”) outside known exons and ncRNAs. Most nonexonic seqfrags are in introns, raising the possibility that they are fragments of pre-mRNAs. The chromosomal locations of the majority of intergenic seqfrags in RNA-Seq data are near known genes, consistent with alternative cleavage and polyadenylation site usage, promoter- and terminator-associated transcripts, or new alternative exons; indeed, reads that bridge splice sites identified 4,544 new exons, affecting 3,554 genes. Most of the remaining seqfrags correspond to either single reads that display characteristics of random sampling from a low-level background or several thousand small transcripts (median length = 111 bp) present at higher levels, which also tend to display sequence conservation and originate from regions with open chromatin. We conclude that, while there are bona fide new intergenic transcripts, their number and abundance is generally low in comparison to known exons, and the genome is not as pervasively transcribed as previously reported. Public Library of Science 2010-05-18 /pmc/articles/PMC2872640/ /pubmed/20502517 http://dx.doi.org/10.1371/journal.pbio.1000371 Text en van Bakel et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
van Bakel, Harm
Nislow, Corey
Blencowe, Benjamin J.
Hughes, Timothy R.
Most “Dark Matter” Transcripts Are Associated With Known Genes
title Most “Dark Matter” Transcripts Are Associated With Known Genes
title_full Most “Dark Matter” Transcripts Are Associated With Known Genes
title_fullStr Most “Dark Matter” Transcripts Are Associated With Known Genes
title_full_unstemmed Most “Dark Matter” Transcripts Are Associated With Known Genes
title_short Most “Dark Matter” Transcripts Are Associated With Known Genes
title_sort most “dark matter” transcripts are associated with known genes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2872640/
https://www.ncbi.nlm.nih.gov/pubmed/20502517
http://dx.doi.org/10.1371/journal.pbio.1000371
work_keys_str_mv AT vanbakelharm mostdarkmattertranscriptsareassociatedwithknowngenes
AT nislowcorey mostdarkmattertranscriptsareassociatedwithknowngenes
AT blencowebenjaminj mostdarkmattertranscriptsareassociatedwithknowngenes
AT hughestimothyr mostdarkmattertranscriptsareassociatedwithknowngenes