Cargando…

Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap

BACKGROUND: While RNA-sequencing (RNA-seq) is becoming a powerful technology in transcriptome profiling, one significant shortcoming of the first-generation RNA-seq protocol is that it does not retain the strand specificity of origin for each transcript. Without strand information it is difficult an...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Shanrong, Zhang, Ying, Gordon, William, Quan, Jie, Xi, Hualin, Du, Sarah, von Schack, David, Zhang, Baohong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4559181/
https://www.ncbi.nlm.nih.gov/pubmed/26334759
http://dx.doi.org/10.1186/s12864-015-1876-7
_version_ 1782388736687865856
author Zhao, Shanrong
Zhang, Ying
Gordon, William
Quan, Jie
Xi, Hualin
Du, Sarah
von Schack, David
Zhang, Baohong
author_facet Zhao, Shanrong
Zhang, Ying
Gordon, William
Quan, Jie
Xi, Hualin
Du, Sarah
von Schack, David
Zhang, Baohong
author_sort Zhao, Shanrong
collection PubMed
description BACKGROUND: While RNA-sequencing (RNA-seq) is becoming a powerful technology in transcriptome profiling, one significant shortcoming of the first-generation RNA-seq protocol is that it does not retain the strand specificity of origin for each transcript. Without strand information it is difficult and sometimes impossible to accurately quantify gene expression levels for genes with overlapping genomic loci that are transcribed from opposite strands. It has recently become possible to retain the strand information by modifying the RNA-seq protocol, known as strand-specific or stranded RNA-seq. Here, we evaluated the advantages of stranded RNA-seq in transcriptome profiling of whole blood RNA samples compared with non-stranded RNA-seq, and investigated the influence of gene overlaps on gene expression profiling results based on practical RNA-seq datasets and also from a theoretical perspective. RESULTS: Our results demonstrated a substantial impact of stranded RNA-seq on transcriptome profiling and gene expression measurements. As many as 1751 genes in Gencode Release 19 were identified to be differentially expressed when comparing stranded and non-stranded RNA-seq whole blood samples. Antisense and pseudogenes were significantly enriched in differential expression analyses. Because stranded RNA-seq retains strand information of a read, we can resolve read ambiguity in overlapping genes transcribed from opposite strands, which provides a more accurate quantification of gene expression levels compared with traditional non-stranded RNA-seq. In the human genome, it is not uncommon to find genomic loci where both strands encode distinct genes. Among the over 57,800 annotated genes in Gencode release 19, there are an estimated 19 % (about 11,000) of overlapping genes transcribed from the opposite strands. Based on our whole blood mRNA-seq datasets, the fraction of overlapping nucleotide bases on the same and opposite strands were estimated at 2.94 % and 3.1 %, respectively. The corresponding theoretical estimations are 3 % and 3.6 %, well in agreement with our own findings. CONCLUSIONS: Stranded RNA-seq provides a more accurate estimate of transcript expression compared with non-stranded RNA-seq, and is therefore the recommended RNA-seq approach for future mRNA-seq studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1876-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4559181
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45591812015-09-04 Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap Zhao, Shanrong Zhang, Ying Gordon, William Quan, Jie Xi, Hualin Du, Sarah von Schack, David Zhang, Baohong BMC Genomics Research Article BACKGROUND: While RNA-sequencing (RNA-seq) is becoming a powerful technology in transcriptome profiling, one significant shortcoming of the first-generation RNA-seq protocol is that it does not retain the strand specificity of origin for each transcript. Without strand information it is difficult and sometimes impossible to accurately quantify gene expression levels for genes with overlapping genomic loci that are transcribed from opposite strands. It has recently become possible to retain the strand information by modifying the RNA-seq protocol, known as strand-specific or stranded RNA-seq. Here, we evaluated the advantages of stranded RNA-seq in transcriptome profiling of whole blood RNA samples compared with non-stranded RNA-seq, and investigated the influence of gene overlaps on gene expression profiling results based on practical RNA-seq datasets and also from a theoretical perspective. RESULTS: Our results demonstrated a substantial impact of stranded RNA-seq on transcriptome profiling and gene expression measurements. As many as 1751 genes in Gencode Release 19 were identified to be differentially expressed when comparing stranded and non-stranded RNA-seq whole blood samples. Antisense and pseudogenes were significantly enriched in differential expression analyses. Because stranded RNA-seq retains strand information of a read, we can resolve read ambiguity in overlapping genes transcribed from opposite strands, which provides a more accurate quantification of gene expression levels compared with traditional non-stranded RNA-seq. In the human genome, it is not uncommon to find genomic loci where both strands encode distinct genes. Among the over 57,800 annotated genes in Gencode release 19, there are an estimated 19 % (about 11,000) of overlapping genes transcribed from the opposite strands. Based on our whole blood mRNA-seq datasets, the fraction of overlapping nucleotide bases on the same and opposite strands were estimated at 2.94 % and 3.1 %, respectively. The corresponding theoretical estimations are 3 % and 3.6 %, well in agreement with our own findings. CONCLUSIONS: Stranded RNA-seq provides a more accurate estimate of transcript expression compared with non-stranded RNA-seq, and is therefore the recommended RNA-seq approach for future mRNA-seq studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1876-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-09-03 /pmc/articles/PMC4559181/ /pubmed/26334759 http://dx.doi.org/10.1186/s12864-015-1876-7 Text en © Zhao et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Zhao, Shanrong
Zhang, Ying
Gordon, William
Quan, Jie
Xi, Hualin
Du, Sarah
von Schack, David
Zhang, Baohong
Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap
title Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap
title_full Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap
title_fullStr Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap
title_full_unstemmed Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap
title_short Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap
title_sort comparison of stranded and non-stranded rna-seq transcriptome profiling and investigation of gene overlap
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4559181/
https://www.ncbi.nlm.nih.gov/pubmed/26334759
http://dx.doi.org/10.1186/s12864-015-1876-7
work_keys_str_mv AT zhaoshanrong comparisonofstrandedandnonstrandedrnaseqtranscriptomeprofilingandinvestigationofgeneoverlap
AT zhangying comparisonofstrandedandnonstrandedrnaseqtranscriptomeprofilingandinvestigationofgeneoverlap
AT gordonwilliam comparisonofstrandedandnonstrandedrnaseqtranscriptomeprofilingandinvestigationofgeneoverlap
AT quanjie comparisonofstrandedandnonstrandedrnaseqtranscriptomeprofilingandinvestigationofgeneoverlap
AT xihualin comparisonofstrandedandnonstrandedrnaseqtranscriptomeprofilingandinvestigationofgeneoverlap
AT dusarah comparisonofstrandedandnonstrandedrnaseqtranscriptomeprofilingandinvestigationofgeneoverlap
AT vonschackdavid comparisonofstrandedandnonstrandedrnaseqtranscriptomeprofilingandinvestigationofgeneoverlap
AT zhangbaohong comparisonofstrandedandnonstrandedrnaseqtranscriptomeprofilingandinvestigationofgeneoverlap