Cargando…

Covering all your bases: incorporating intron signal from RNA-seq data

RNA-seq datasets can contain millions of intron reads per library that are typically removed from downstream analysis. Only reads overlapping annotated exons are considered to be informative since mature mRNA is assumed to be the major component sequenced, especially for poly(A) RNA libraries. In th...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Stuart, Zhang, Albert Y, Su, Shian, Ng, Ashley P, Holik, Aliaksei Z, Asselin-Labat, Marie-Liesse, Ritchie, Matthew E, Law, Charity W
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671406/
https://www.ncbi.nlm.nih.gov/pubmed/33575621
http://dx.doi.org/10.1093/nargab/lqaa073
_version_ 1783610923036966912
author Lee, Stuart
Zhang, Albert Y
Su, Shian
Ng, Ashley P
Holik, Aliaksei Z
Asselin-Labat, Marie-Liesse
Ritchie, Matthew E
Law, Charity W
author_facet Lee, Stuart
Zhang, Albert Y
Su, Shian
Ng, Ashley P
Holik, Aliaksei Z
Asselin-Labat, Marie-Liesse
Ritchie, Matthew E
Law, Charity W
author_sort Lee, Stuart
collection PubMed
description RNA-seq datasets can contain millions of intron reads per library that are typically removed from downstream analysis. Only reads overlapping annotated exons are considered to be informative since mature mRNA is assumed to be the major component sequenced, especially for poly(A) RNA libraries. In this study, we show that intron reads are informative, and through exploratory data analysis of read coverage that intron signal is representative of both pre-mRNAs and intron retention. We demonstrate how intron reads can be utilized in differential expression analysis using our index method where a unique set of differentially expressed genes can be detected using intron counts. In exploring read coverage, we also developed the superintronic software that quickly and robustly calculates user-defined summary statistics for exonic and intronic regions. Across multiple datasets, superintronic enabled us to identify several genes with distinctly retained introns that had similar coverage levels to that of neighbouring exons. The work and ideas presented in this paper is the first of its kind to consider multiple biological sources for intron reads through exploratory data analysis, minimizing bias in discovery and interpretation of results. Our findings open up possibilities for further methods development for intron reads and RNA-seq data in general.
format Online
Article
Text
id pubmed-7671406
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-76714062021-02-10 Covering all your bases: incorporating intron signal from RNA-seq data Lee, Stuart Zhang, Albert Y Su, Shian Ng, Ashley P Holik, Aliaksei Z Asselin-Labat, Marie-Liesse Ritchie, Matthew E Law, Charity W NAR Genom Bioinform Standard Article RNA-seq datasets can contain millions of intron reads per library that are typically removed from downstream analysis. Only reads overlapping annotated exons are considered to be informative since mature mRNA is assumed to be the major component sequenced, especially for poly(A) RNA libraries. In this study, we show that intron reads are informative, and through exploratory data analysis of read coverage that intron signal is representative of both pre-mRNAs and intron retention. We demonstrate how intron reads can be utilized in differential expression analysis using our index method where a unique set of differentially expressed genes can be detected using intron counts. In exploring read coverage, we also developed the superintronic software that quickly and robustly calculates user-defined summary statistics for exonic and intronic regions. Across multiple datasets, superintronic enabled us to identify several genes with distinctly retained introns that had similar coverage levels to that of neighbouring exons. The work and ideas presented in this paper is the first of its kind to consider multiple biological sources for intron reads through exploratory data analysis, minimizing bias in discovery and interpretation of results. Our findings open up possibilities for further methods development for intron reads and RNA-seq data in general. Oxford University Press 2020-09-22 /pmc/articles/PMC7671406/ /pubmed/33575621 http://dx.doi.org/10.1093/nargab/lqaa073 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Standard Article
Lee, Stuart
Zhang, Albert Y
Su, Shian
Ng, Ashley P
Holik, Aliaksei Z
Asselin-Labat, Marie-Liesse
Ritchie, Matthew E
Law, Charity W
Covering all your bases: incorporating intron signal from RNA-seq data
title Covering all your bases: incorporating intron signal from RNA-seq data
title_full Covering all your bases: incorporating intron signal from RNA-seq data
title_fullStr Covering all your bases: incorporating intron signal from RNA-seq data
title_full_unstemmed Covering all your bases: incorporating intron signal from RNA-seq data
title_short Covering all your bases: incorporating intron signal from RNA-seq data
title_sort covering all your bases: incorporating intron signal from rna-seq data
topic Standard Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671406/
https://www.ncbi.nlm.nih.gov/pubmed/33575621
http://dx.doi.org/10.1093/nargab/lqaa073
work_keys_str_mv AT leestuart coveringallyourbasesincorporatingintronsignalfromrnaseqdata
AT zhangalberty coveringallyourbasesincorporatingintronsignalfromrnaseqdata
AT sushian coveringallyourbasesincorporatingintronsignalfromrnaseqdata
AT ngashleyp coveringallyourbasesincorporatingintronsignalfromrnaseqdata
AT holikaliakseiz coveringallyourbasesincorporatingintronsignalfromrnaseqdata
AT asselinlabatmarieliesse coveringallyourbasesincorporatingintronsignalfromrnaseqdata
AT ritchiematthewe coveringallyourbasesincorporatingintronsignalfromrnaseqdata
AT lawcharityw coveringallyourbasesincorporatingintronsignalfromrnaseqdata