Cargando…

Identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-Fly

BACKGROUND: It is generally thought that most canonical or non-canonical splicing events involving U2- and U12 spliceosomes occur within nuclear pre-mRNAs. However, the question of whether at least some U12-type splicing occurs in the cytoplasm is still unclear. In recent years next-generation seque...

Descripción completa

Detalles Bibliográficos
Autores principales: Bai, Yongsheng, Kinne, Jeff, Ding, Lizhong, Rath, Ethan C., Cox, Aaron, Naidu, Siva Dharman
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5629565/
https://www.ncbi.nlm.nih.gov/pubmed/28984182
http://dx.doi.org/10.1186/s12859-017-1801-y
_version_ 1783269068548079616
author Bai, Yongsheng
Kinne, Jeff
Ding, Lizhong
Rath, Ethan C.
Cox, Aaron
Naidu, Siva Dharman
author_facet Bai, Yongsheng
Kinne, Jeff
Ding, Lizhong
Rath, Ethan C.
Cox, Aaron
Naidu, Siva Dharman
author_sort Bai, Yongsheng
collection PubMed
description BACKGROUND: It is generally thought that most canonical or non-canonical splicing events involving U2- and U12 spliceosomes occur within nuclear pre-mRNAs. However, the question of whether at least some U12-type splicing occurs in the cytoplasm is still unclear. In recent years next-generation sequencing technologies have revolutionized the field. The “Read-Split-Walk” (RSW) and “Read-Split-Run” (RSR) methods were developed to identify genome-wide non-canonical spliced regions including special events occurring in cytoplasm. As the significant amount of genome/transcriptome data such as, Encyclopedia of DNA Elements (ENCODE) project, have been generated, we have advanced a newer more memory-efficient version of the algorithm, “Read-Split-Fly” (RSF), which can detect non-canonical spliced regions with higher sensitivity and improved speed. The RSF algorithm also outputs the spliced sequences for further downstream biological function analysis. RESULTS: We used open access ENCODE project RNA-Seq data to search spliced intron sequences against the U12-type spliced intron sequence database to examine whether some events could occur as potential signatures of U12-type splicing. The check was performed by searching spliced sequences against 5’ss and 3’ss sequences from the well-known orthologous U12-type spliceosomal intron database U12DB. Preliminary results of searching 70 ENCODE samples indicated that the presence of 5’ss with U12-type signature is more frequent than U2-type and prevalent in non-canonical junctions reported by RSF. The selected spliced sequences have also been further studied using miRBase to elucidate their functionality. Preliminary results from 70 samples of ENCODE datasets show that several miRNAs are prevalent in studied ENCODE samples. Two of these are associated with many diseases as suggested in the literature. Specifically, hsa-miR-1273 and hsa-miR-548 are associated with many diseases and cancers. CONCLUSIONS: Our RSF pipeline is able to detect many possible junctions (especially those with a high RPKM) with very high overall accuracy and relative high accuracy for novel junctions. We have incorporated useful parameter features into the pipeline such as, handling variable-length read data, and searching spliced sequences for splicing signatures and miRNA events. We suggest RSF, a tool for identifying novel splicing events, is applicable to study a range of diseases across biological systems under different experimental conditions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-017-1801-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5629565
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56295652017-10-13 Identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-Fly Bai, Yongsheng Kinne, Jeff Ding, Lizhong Rath, Ethan C. Cox, Aaron Naidu, Siva Dharman BMC Bioinformatics Research BACKGROUND: It is generally thought that most canonical or non-canonical splicing events involving U2- and U12 spliceosomes occur within nuclear pre-mRNAs. However, the question of whether at least some U12-type splicing occurs in the cytoplasm is still unclear. In recent years next-generation sequencing technologies have revolutionized the field. The “Read-Split-Walk” (RSW) and “Read-Split-Run” (RSR) methods were developed to identify genome-wide non-canonical spliced regions including special events occurring in cytoplasm. As the significant amount of genome/transcriptome data such as, Encyclopedia of DNA Elements (ENCODE) project, have been generated, we have advanced a newer more memory-efficient version of the algorithm, “Read-Split-Fly” (RSF), which can detect non-canonical spliced regions with higher sensitivity and improved speed. The RSF algorithm also outputs the spliced sequences for further downstream biological function analysis. RESULTS: We used open access ENCODE project RNA-Seq data to search spliced intron sequences against the U12-type spliced intron sequence database to examine whether some events could occur as potential signatures of U12-type splicing. The check was performed by searching spliced sequences against 5’ss and 3’ss sequences from the well-known orthologous U12-type spliceosomal intron database U12DB. Preliminary results of searching 70 ENCODE samples indicated that the presence of 5’ss with U12-type signature is more frequent than U2-type and prevalent in non-canonical junctions reported by RSF. The selected spliced sequences have also been further studied using miRBase to elucidate their functionality. Preliminary results from 70 samples of ENCODE datasets show that several miRNAs are prevalent in studied ENCODE samples. Two of these are associated with many diseases as suggested in the literature. Specifically, hsa-miR-1273 and hsa-miR-548 are associated with many diseases and cancers. CONCLUSIONS: Our RSF pipeline is able to detect many possible junctions (especially those with a high RPKM) with very high overall accuracy and relative high accuracy for novel junctions. We have incorporated useful parameter features into the pipeline such as, handling variable-length read data, and searching spliced sequences for splicing signatures and miRNA events. We suggest RSF, a tool for identifying novel splicing events, is applicable to study a range of diseases across biological systems under different experimental conditions. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-017-1801-y) contains supplementary material, which is available to authorized users. BioMed Central 2017-10-03 /pmc/articles/PMC5629565/ /pubmed/28984182 http://dx.doi.org/10.1186/s12859-017-1801-y Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Bai, Yongsheng
Kinne, Jeff
Ding, Lizhong
Rath, Ethan C.
Cox, Aaron
Naidu, Siva Dharman
Identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-Fly
title Identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-Fly
title_full Identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-Fly
title_fullStr Identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-Fly
title_full_unstemmed Identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-Fly
title_short Identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using Read-Split-Fly
title_sort identification of genome-wide non-canonical spliced regions and analysis of biological functions for spliced sequences using read-split-fly
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5629565/
https://www.ncbi.nlm.nih.gov/pubmed/28984182
http://dx.doi.org/10.1186/s12859-017-1801-y
work_keys_str_mv AT baiyongsheng identificationofgenomewidenoncanonicalsplicedregionsandanalysisofbiologicalfunctionsforsplicedsequencesusingreadsplitfly
AT kinnejeff identificationofgenomewidenoncanonicalsplicedregionsandanalysisofbiologicalfunctionsforsplicedsequencesusingreadsplitfly
AT dinglizhong identificationofgenomewidenoncanonicalsplicedregionsandanalysisofbiologicalfunctionsforsplicedsequencesusingreadsplitfly
AT rathethanc identificationofgenomewidenoncanonicalsplicedregionsandanalysisofbiologicalfunctionsforsplicedsequencesusingreadsplitfly
AT coxaaron identificationofgenomewidenoncanonicalsplicedregionsandanalysisofbiologicalfunctionsforsplicedsequencesusingreadsplitfly
AT naidusivadharman identificationofgenomewidenoncanonicalsplicedregionsandanalysisofbiologicalfunctionsforsplicedsequencesusingreadsplitfly