Cargando…
Covering all your bases: incorporating intron signal from RNA-seq data
RNA-seq datasets can contain millions of intron reads per library that are typically removed from downstream analysis. Only reads overlapping annotated exons are considered to be informative since mature mRNA is assumed to be the major component sequenced, especially for poly(A) RNA libraries. In th...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671406/ https://www.ncbi.nlm.nih.gov/pubmed/33575621 http://dx.doi.org/10.1093/nargab/lqaa073 |
_version_ | 1783610923036966912 |
---|---|
author | Lee, Stuart Zhang, Albert Y Su, Shian Ng, Ashley P Holik, Aliaksei Z Asselin-Labat, Marie-Liesse Ritchie, Matthew E Law, Charity W |
author_facet | Lee, Stuart Zhang, Albert Y Su, Shian Ng, Ashley P Holik, Aliaksei Z Asselin-Labat, Marie-Liesse Ritchie, Matthew E Law, Charity W |
author_sort | Lee, Stuart |
collection | PubMed |
description | RNA-seq datasets can contain millions of intron reads per library that are typically removed from downstream analysis. Only reads overlapping annotated exons are considered to be informative since mature mRNA is assumed to be the major component sequenced, especially for poly(A) RNA libraries. In this study, we show that intron reads are informative, and through exploratory data analysis of read coverage that intron signal is representative of both pre-mRNAs and intron retention. We demonstrate how intron reads can be utilized in differential expression analysis using our index method where a unique set of differentially expressed genes can be detected using intron counts. In exploring read coverage, we also developed the superintronic software that quickly and robustly calculates user-defined summary statistics for exonic and intronic regions. Across multiple datasets, superintronic enabled us to identify several genes with distinctly retained introns that had similar coverage levels to that of neighbouring exons. The work and ideas presented in this paper is the first of its kind to consider multiple biological sources for intron reads through exploratory data analysis, minimizing bias in discovery and interpretation of results. Our findings open up possibilities for further methods development for intron reads and RNA-seq data in general. |
format | Online Article Text |
id | pubmed-7671406 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-76714062021-02-10 Covering all your bases: incorporating intron signal from RNA-seq data Lee, Stuart Zhang, Albert Y Su, Shian Ng, Ashley P Holik, Aliaksei Z Asselin-Labat, Marie-Liesse Ritchie, Matthew E Law, Charity W NAR Genom Bioinform Standard Article RNA-seq datasets can contain millions of intron reads per library that are typically removed from downstream analysis. Only reads overlapping annotated exons are considered to be informative since mature mRNA is assumed to be the major component sequenced, especially for poly(A) RNA libraries. In this study, we show that intron reads are informative, and through exploratory data analysis of read coverage that intron signal is representative of both pre-mRNAs and intron retention. We demonstrate how intron reads can be utilized in differential expression analysis using our index method where a unique set of differentially expressed genes can be detected using intron counts. In exploring read coverage, we also developed the superintronic software that quickly and robustly calculates user-defined summary statistics for exonic and intronic regions. Across multiple datasets, superintronic enabled us to identify several genes with distinctly retained introns that had similar coverage levels to that of neighbouring exons. The work and ideas presented in this paper is the first of its kind to consider multiple biological sources for intron reads through exploratory data analysis, minimizing bias in discovery and interpretation of results. Our findings open up possibilities for further methods development for intron reads and RNA-seq data in general. Oxford University Press 2020-09-22 /pmc/articles/PMC7671406/ /pubmed/33575621 http://dx.doi.org/10.1093/nargab/lqaa073 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Standard Article Lee, Stuart Zhang, Albert Y Su, Shian Ng, Ashley P Holik, Aliaksei Z Asselin-Labat, Marie-Liesse Ritchie, Matthew E Law, Charity W Covering all your bases: incorporating intron signal from RNA-seq data |
title | Covering all your bases: incorporating intron signal from RNA-seq data |
title_full | Covering all your bases: incorporating intron signal from RNA-seq data |
title_fullStr | Covering all your bases: incorporating intron signal from RNA-seq data |
title_full_unstemmed | Covering all your bases: incorporating intron signal from RNA-seq data |
title_short | Covering all your bases: incorporating intron signal from RNA-seq data |
title_sort | covering all your bases: incorporating intron signal from rna-seq data |
topic | Standard Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671406/ https://www.ncbi.nlm.nih.gov/pubmed/33575621 http://dx.doi.org/10.1093/nargab/lqaa073 |
work_keys_str_mv | AT leestuart coveringallyourbasesincorporatingintronsignalfromrnaseqdata AT zhangalberty coveringallyourbasesincorporatingintronsignalfromrnaseqdata AT sushian coveringallyourbasesincorporatingintronsignalfromrnaseqdata AT ngashleyp coveringallyourbasesincorporatingintronsignalfromrnaseqdata AT holikaliakseiz coveringallyourbasesincorporatingintronsignalfromrnaseqdata AT asselinlabatmarieliesse coveringallyourbasesincorporatingintronsignalfromrnaseqdata AT ritchiematthewe coveringallyourbasesincorporatingintronsignalfromrnaseqdata AT lawcharityw coveringallyourbasesincorporatingintronsignalfromrnaseqdata |