Cargando…

Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive

BACKGROUND: Gene annotations, such as those in GENCODE, are derived primarily from alignments of spliced cDNA sequences and protein sequences. The impact of RNA-seq data on annotation has been confined to major projects like ENCODE and Illumina Body Map 2.0. RESULTS: We aligned 21,504 Illumina-seque...

Descripción completa

Detalles Bibliográficos
Autores principales: Nellore, Abhinav, Jaffe, Andrew E., Fortin, Jean-Philippe, Alquicira-Hernández, José, Collado-Torres, Leonardo, Wang, Siruo, Phillips III, Robert A., Karbhari, Nishika, Hansen, Kasper D., Langmead, Ben, Leek, Jeffrey T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5203714/
https://www.ncbi.nlm.nih.gov/pubmed/28038678
http://dx.doi.org/10.1186/s13059-016-1118-6
_version_ 1782489776158408704
author Nellore, Abhinav
Jaffe, Andrew E.
Fortin, Jean-Philippe
Alquicira-Hernández, José
Collado-Torres, Leonardo
Wang, Siruo
Phillips III, Robert A.
Karbhari, Nishika
Hansen, Kasper D.
Langmead, Ben
Leek, Jeffrey T.
author_facet Nellore, Abhinav
Jaffe, Andrew E.
Fortin, Jean-Philippe
Alquicira-Hernández, José
Collado-Torres, Leonardo
Wang, Siruo
Phillips III, Robert A.
Karbhari, Nishika
Hansen, Kasper D.
Langmead, Ben
Leek, Jeffrey T.
author_sort Nellore, Abhinav
collection PubMed
description BACKGROUND: Gene annotations, such as those in GENCODE, are derived primarily from alignments of spliced cDNA sequences and protein sequences. The impact of RNA-seq data on annotation has been confined to major projects like ENCODE and Illumina Body Map 2.0. RESULTS: We aligned 21,504 Illumina-sequenced human RNA-seq samples from the Sequence Read Archive (SRA) to the human genome and compared detected exon-exon junctions with junctions in several recent gene annotations. We found 56,861 junctions (18.6%) in at least 1000 samples that were not annotated, and their expression associated with tissue type. Junctions well expressed in individual samples tended to be annotated. Newer samples contributed few novel well-supported junctions, with the vast majority of detected junctions present in samples before 2013. We compiled junction data into a resource called intropolis available at http://intropolis.rail.bio. We used this resource to search for a recently validated isoform of the ALK gene and characterized the potential functional implications of unannotated junctions with publicly available TRAP-seq data. CONCLUSIONS: Considering only the variation contained in annotation may suffice if an investigator is interested only in well-expressed transcript isoforms. However, genes that are not generally well expressed and nonetheless present in a small but significant number of samples in the SRA are likelier to be incompletely annotated. The rate at which evidence for novel junctions has been added to the SRA has tapered dramatically, even to the point of an asymptote. Now is perhaps an appropriate time to update incomplete annotations to include splicing present in the now-stable snapshot provided by the SRA.
format Online
Article
Text
id pubmed-5203714
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-52037142017-01-03 Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive Nellore, Abhinav Jaffe, Andrew E. Fortin, Jean-Philippe Alquicira-Hernández, José Collado-Torres, Leonardo Wang, Siruo Phillips III, Robert A. Karbhari, Nishika Hansen, Kasper D. Langmead, Ben Leek, Jeffrey T. Genome Biol Research BACKGROUND: Gene annotations, such as those in GENCODE, are derived primarily from alignments of spliced cDNA sequences and protein sequences. The impact of RNA-seq data on annotation has been confined to major projects like ENCODE and Illumina Body Map 2.0. RESULTS: We aligned 21,504 Illumina-sequenced human RNA-seq samples from the Sequence Read Archive (SRA) to the human genome and compared detected exon-exon junctions with junctions in several recent gene annotations. We found 56,861 junctions (18.6%) in at least 1000 samples that were not annotated, and their expression associated with tissue type. Junctions well expressed in individual samples tended to be annotated. Newer samples contributed few novel well-supported junctions, with the vast majority of detected junctions present in samples before 2013. We compiled junction data into a resource called intropolis available at http://intropolis.rail.bio. We used this resource to search for a recently validated isoform of the ALK gene and characterized the potential functional implications of unannotated junctions with publicly available TRAP-seq data. CONCLUSIONS: Considering only the variation contained in annotation may suffice if an investigator is interested only in well-expressed transcript isoforms. However, genes that are not generally well expressed and nonetheless present in a small but significant number of samples in the SRA are likelier to be incompletely annotated. The rate at which evidence for novel junctions has been added to the SRA has tapered dramatically, even to the point of an asymptote. Now is perhaps an appropriate time to update incomplete annotations to include splicing present in the now-stable snapshot provided by the SRA. BioMed Central 2016-12-30 /pmc/articles/PMC5203714/ /pubmed/28038678 http://dx.doi.org/10.1186/s13059-016-1118-6 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Nellore, Abhinav
Jaffe, Andrew E.
Fortin, Jean-Philippe
Alquicira-Hernández, José
Collado-Torres, Leonardo
Wang, Siruo
Phillips III, Robert A.
Karbhari, Nishika
Hansen, Kasper D.
Langmead, Ben
Leek, Jeffrey T.
Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive
title Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive
title_full Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive
title_fullStr Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive
title_full_unstemmed Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive
title_short Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive
title_sort human splicing diversity and the extent of unannotated splice junctions across human rna-seq samples on the sequence read archive
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5203714/
https://www.ncbi.nlm.nih.gov/pubmed/28038678
http://dx.doi.org/10.1186/s13059-016-1118-6
work_keys_str_mv AT nelloreabhinav humansplicingdiversityandtheextentofunannotatedsplicejunctionsacrosshumanrnaseqsamplesonthesequencereadarchive
AT jaffeandrewe humansplicingdiversityandtheextentofunannotatedsplicejunctionsacrosshumanrnaseqsamplesonthesequencereadarchive
AT fortinjeanphilippe humansplicingdiversityandtheextentofunannotatedsplicejunctionsacrosshumanrnaseqsamplesonthesequencereadarchive
AT alquicirahernandezjose humansplicingdiversityandtheextentofunannotatedsplicejunctionsacrosshumanrnaseqsamplesonthesequencereadarchive
AT colladotorresleonardo humansplicingdiversityandtheextentofunannotatedsplicejunctionsacrosshumanrnaseqsamplesonthesequencereadarchive
AT wangsiruo humansplicingdiversityandtheextentofunannotatedsplicejunctionsacrosshumanrnaseqsamplesonthesequencereadarchive
AT phillipsiiiroberta humansplicingdiversityandtheextentofunannotatedsplicejunctionsacrosshumanrnaseqsamplesonthesequencereadarchive
AT karbharinishika humansplicingdiversityandtheextentofunannotatedsplicejunctionsacrosshumanrnaseqsamplesonthesequencereadarchive
AT hansenkasperd humansplicingdiversityandtheextentofunannotatedsplicejunctionsacrosshumanrnaseqsamplesonthesequencereadarchive
AT langmeadben humansplicingdiversityandtheextentofunannotatedsplicejunctionsacrosshumanrnaseqsamplesonthesequencereadarchive
AT leekjeffreyt humansplicingdiversityandtheextentofunannotatedsplicejunctionsacrosshumanrnaseqsamplesonthesequencereadarchive