Cargando…

Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data

BACKGROUND: Most existing tools for detecting next-generation sequencing-based splicing events focus on generic splicing events. Consequently, special types of non-canonical splicing events of short mRNA regions (IRE1α targeted) have not yet been thoroughly addressed at a genome-wide level using bio...

Descripción completa

Detalles Bibliográficos
Autores principales: Bai, Yongsheng, Kinne, Jeff, Donham, Brandon, Jiang, Feng, Ding, Lizhong, Hassler, Justin R., Kaufman, Randal J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001233/
https://www.ncbi.nlm.nih.gov/pubmed/27556805
http://dx.doi.org/10.1186/s12864-016-2896-7
_version_ 1782450436327866368
author Bai, Yongsheng
Kinne, Jeff
Donham, Brandon
Jiang, Feng
Ding, Lizhong
Hassler, Justin R.
Kaufman, Randal J.
author_facet Bai, Yongsheng
Kinne, Jeff
Donham, Brandon
Jiang, Feng
Ding, Lizhong
Hassler, Justin R.
Kaufman, Randal J.
author_sort Bai, Yongsheng
collection PubMed
description BACKGROUND: Most existing tools for detecting next-generation sequencing-based splicing events focus on generic splicing events. Consequently, special types of non-canonical splicing events of short mRNA regions (IRE1α targeted) have not yet been thoroughly addressed at a genome-wide level using bioinformatics approaches in conjunction with next-generation technologies. During endoplasmic reticulum (ER) stress, the gene encoding the RNase Ire1α is known to splice out a short 26 nt region from the mRNA of the transcription factor Xbp1 non-canonically within the cytosol. This causes an open reading frame-shift that induces expression of many downstream genes in reaction to ER stress as part of the unfolded protein response (UPR). We previously published an algorithm termed “Read-Split-Walk” (RSW) to identify non-canonical splicing regions using RNA-Seq data and applied it to ER stress-induced Ire1α heterozygote and knockout mouse embryonic fibroblast cell lines. In this study, we have developed an improved algorithm “Read-Split-Run” (RSR) for detecting genome-wide Ire1α-targeted genes with non-canonical spliced regions at a faster speed. We applied the RSR algorithm using different combinations of several parameters to the previously RSW tested mouse embryonic fibroblast cells (MEF) and the human Encyclopedia of DNA Elements (ENCODE) RNA-Seq data. We also compared the performance of RSR with two other alternative splicing events identification tools (TopHat (Trapnell et al., Bioinformatics 25:1105–1111, 2009) and Alt Event Finder (Zhou et al., BMC Genomics 13:S10, 2012)) utilizing the context of the spliced Xbp1 mRNA as a positive control in the data sets we identified it to be the top cleavage target present in Ire1α(+/−) but absent in Ire1α(−/−) MEF samples and this comparison was also extended to human ENCODE RNA-Seq data. RESULTS: Proof of principle came in our results by the fact that the 26 nt non-conventional splice site in Xbp1 was detected as the top hit by our new RSR algorithm in heterozygote (Het) samples from both Thapsigargin (Tg) and Dithiothreitol (Dtt) treated experiments but absent in the negative control Ire1α knock-out (KO) samples. Applying different combinations of parameters to the mouse MEF RNA-Seq data, we suggest a General Linear Model (GLM) for both Tg and Dtt treated experiments. We also ran RSR for a human ENCODE RNA-Seq dataset and identified 32,597 spliced regions for regular chromosomes. TopHat (Trapnell et al., Bioinformatics 25:1105–1111, 2009) and Alt Event Finder (Zhou et al., BMC Genomics 13:S10, 2012) identified 237,155 spliced junctions and 9,129 exon skipping events (excluding chr14), respectively. Our Read-Split-Run algorithm also outperformed others in the context of ranking Xbp1 gene as the top cleavage target present in Ire1α(+/−) but absent in Ire1α(−/−) MEF samples. The RSR package including source codes is available at http://bioinf1.indstate.edu/RSR and its pipeline source codes are also freely available at https://github.com/xuric/read-split-run for academic use. CONCLUSIONS: Our new RSR algorithm has the capability of processing massive amounts of human ENCODE RNA-Seq data for identifying novel splice junction sites at a genome-wide level in a much more efficient manner when compared to the previous RSW algorithm. Our proposed model can also predict the number of spliced regions under any combinations of parameters. Our pipeline can detect novel spliced sites for other species using RNA-Seq data generated under similar conditions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2896-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5001233
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50012332016-09-06 Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data Bai, Yongsheng Kinne, Jeff Donham, Brandon Jiang, Feng Ding, Lizhong Hassler, Justin R. Kaufman, Randal J. BMC Genomics Research BACKGROUND: Most existing tools for detecting next-generation sequencing-based splicing events focus on generic splicing events. Consequently, special types of non-canonical splicing events of short mRNA regions (IRE1α targeted) have not yet been thoroughly addressed at a genome-wide level using bioinformatics approaches in conjunction with next-generation technologies. During endoplasmic reticulum (ER) stress, the gene encoding the RNase Ire1α is known to splice out a short 26 nt region from the mRNA of the transcription factor Xbp1 non-canonically within the cytosol. This causes an open reading frame-shift that induces expression of many downstream genes in reaction to ER stress as part of the unfolded protein response (UPR). We previously published an algorithm termed “Read-Split-Walk” (RSW) to identify non-canonical splicing regions using RNA-Seq data and applied it to ER stress-induced Ire1α heterozygote and knockout mouse embryonic fibroblast cell lines. In this study, we have developed an improved algorithm “Read-Split-Run” (RSR) for detecting genome-wide Ire1α-targeted genes with non-canonical spliced regions at a faster speed. We applied the RSR algorithm using different combinations of several parameters to the previously RSW tested mouse embryonic fibroblast cells (MEF) and the human Encyclopedia of DNA Elements (ENCODE) RNA-Seq data. We also compared the performance of RSR with two other alternative splicing events identification tools (TopHat (Trapnell et al., Bioinformatics 25:1105–1111, 2009) and Alt Event Finder (Zhou et al., BMC Genomics 13:S10, 2012)) utilizing the context of the spliced Xbp1 mRNA as a positive control in the data sets we identified it to be the top cleavage target present in Ire1α(+/−) but absent in Ire1α(−/−) MEF samples and this comparison was also extended to human ENCODE RNA-Seq data. RESULTS: Proof of principle came in our results by the fact that the 26 nt non-conventional splice site in Xbp1 was detected as the top hit by our new RSR algorithm in heterozygote (Het) samples from both Thapsigargin (Tg) and Dithiothreitol (Dtt) treated experiments but absent in the negative control Ire1α knock-out (KO) samples. Applying different combinations of parameters to the mouse MEF RNA-Seq data, we suggest a General Linear Model (GLM) for both Tg and Dtt treated experiments. We also ran RSR for a human ENCODE RNA-Seq dataset and identified 32,597 spliced regions for regular chromosomes. TopHat (Trapnell et al., Bioinformatics 25:1105–1111, 2009) and Alt Event Finder (Zhou et al., BMC Genomics 13:S10, 2012) identified 237,155 spliced junctions and 9,129 exon skipping events (excluding chr14), respectively. Our Read-Split-Run algorithm also outperformed others in the context of ranking Xbp1 gene as the top cleavage target present in Ire1α(+/−) but absent in Ire1α(−/−) MEF samples. The RSR package including source codes is available at http://bioinf1.indstate.edu/RSR and its pipeline source codes are also freely available at https://github.com/xuric/read-split-run for academic use. CONCLUSIONS: Our new RSR algorithm has the capability of processing massive amounts of human ENCODE RNA-Seq data for identifying novel splice junction sites at a genome-wide level in a much more efficient manner when compared to the previous RSW algorithm. Our proposed model can also predict the number of spliced regions under any combinations of parameters. Our pipeline can detect novel spliced sites for other species using RNA-Seq data generated under similar conditions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2896-7) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-22 /pmc/articles/PMC5001233/ /pubmed/27556805 http://dx.doi.org/10.1186/s12864-016-2896-7 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Bai, Yongsheng
Kinne, Jeff
Donham, Brandon
Jiang, Feng
Ding, Lizhong
Hassler, Justin R.
Kaufman, Randal J.
Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data
title Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data
title_full Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data
title_fullStr Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data
title_full_unstemmed Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data
title_short Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data
title_sort read-split-run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using rna-seq data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001233/
https://www.ncbi.nlm.nih.gov/pubmed/27556805
http://dx.doi.org/10.1186/s12864-016-2896-7
work_keys_str_mv AT baiyongsheng readsplitrunanimprovedbioinformaticspipelineforidentificationofgenomewidenoncanonicalsplicedregionsusingrnaseqdata
AT kinnejeff readsplitrunanimprovedbioinformaticspipelineforidentificationofgenomewidenoncanonicalsplicedregionsusingrnaseqdata
AT donhambrandon readsplitrunanimprovedbioinformaticspipelineforidentificationofgenomewidenoncanonicalsplicedregionsusingrnaseqdata
AT jiangfeng readsplitrunanimprovedbioinformaticspipelineforidentificationofgenomewidenoncanonicalsplicedregionsusingrnaseqdata
AT dinglizhong readsplitrunanimprovedbioinformaticspipelineforidentificationofgenomewidenoncanonicalsplicedregionsusingrnaseqdata
AT hasslerjustinr readsplitrunanimprovedbioinformaticspipelineforidentificationofgenomewidenoncanonicalsplicedregionsusingrnaseqdata
AT kaufmanrandalj readsplitrunanimprovedbioinformaticspipelineforidentificationofgenomewidenoncanonicalsplicedregionsusingrnaseqdata