Cargando…
IsoSplitter: identification and characterization of alternative splicing sites without a reference genome
Long-read transcriptome sequencing is designed to sequence full-length RNA molecules and advantageous for identifying alternative splice isoforms; however, in the absence of a reference genome, it is difficult to accurately locate splice sites because of the diversity of patterns of alternative spli...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8284324/ https://www.ncbi.nlm.nih.gov/pubmed/34021065 http://dx.doi.org/10.1261/rna.077834.120 |
Sumario: | Long-read transcriptome sequencing is designed to sequence full-length RNA molecules and advantageous for identifying alternative splice isoforms; however, in the absence of a reference genome, it is difficult to accurately locate splice sites because of the diversity of patterns of alternative splicing (AS). Based on long-read transcriptome data, we developed a versatile tool, IsoSplitter, to reverse-trace and validate AS gene “split sites” with the following features: (i) IsoSplitter initially invokes a modified SIM4 program to find transcript split sites; (ii) each split site is then quantified, to reveal transcript diversity, and putative isoforms are grouped into gene clusters; (iii) an optional step for aligning short reads is provided, to validate split sites by identifying unique junction reads, and revealing and quantifying tissue-specific alternative splice isoforms. We tested IsoSplitter AS prediction using data sets from multiple model and nonmodel plant species and showed that the IsoSplitter pipeline is efficient to handle different transcriptomes with high accuracy. Furthermore, we evaluated the IsoSplitter pipeline compared with that of the splice junction identification tools, Program to Assemble Spliced Alignments (PASA software needs a reference genome for AS identification) and AStrap, using data from the model plant Arabidopsis thaliana. We found that IsoSplitter determined more than twice as many AS events than AStrap analysis, and 94.13% of the IsoSplitter predicted AS events were also identified by the PASA analysis. Starting from a simple sequence file, IsoSplitter is an assembly-free tool for identification and characterization of AS. IsoSplitter is developed and implemented in Python 3.5 using the Linux platform and is freely available at https://github.com/Hengfu-Yin/IsoSplitter. |
---|