Cargando…

IsoSplitter: identification and characterization of alternative splicing sites without a reference genome

Long-read transcriptome sequencing is designed to sequence full-length RNA molecules and advantageous for identifying alternative splice isoforms; however, in the absence of a reference genome, it is difficult to accurately locate splice sites because of the diversity of patterns of alternative spli...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Yupeng, Hu, Zhikang, Ye, Ning, Yin, Hengfu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8284324/
https://www.ncbi.nlm.nih.gov/pubmed/34021065
http://dx.doi.org/10.1261/rna.077834.120
_version_ 1783723379024461824
author Wang, Yupeng
Hu, Zhikang
Ye, Ning
Yin, Hengfu
author_facet Wang, Yupeng
Hu, Zhikang
Ye, Ning
Yin, Hengfu
author_sort Wang, Yupeng
collection PubMed
description Long-read transcriptome sequencing is designed to sequence full-length RNA molecules and advantageous for identifying alternative splice isoforms; however, in the absence of a reference genome, it is difficult to accurately locate splice sites because of the diversity of patterns of alternative splicing (AS). Based on long-read transcriptome data, we developed a versatile tool, IsoSplitter, to reverse-trace and validate AS gene “split sites” with the following features: (i) IsoSplitter initially invokes a modified SIM4 program to find transcript split sites; (ii) each split site is then quantified, to reveal transcript diversity, and putative isoforms are grouped into gene clusters; (iii) an optional step for aligning short reads is provided, to validate split sites by identifying unique junction reads, and revealing and quantifying tissue-specific alternative splice isoforms. We tested IsoSplitter AS prediction using data sets from multiple model and nonmodel plant species and showed that the IsoSplitter pipeline is efficient to handle different transcriptomes with high accuracy. Furthermore, we evaluated the IsoSplitter pipeline compared with that of the splice junction identification tools, Program to Assemble Spliced Alignments (PASA software needs a reference genome for AS identification) and AStrap, using data from the model plant Arabidopsis thaliana. We found that IsoSplitter determined more than twice as many AS events than AStrap analysis, and 94.13% of the IsoSplitter predicted AS events were also identified by the PASA analysis. Starting from a simple sequence file, IsoSplitter is an assembly-free tool for identification and characterization of AS. IsoSplitter is developed and implemented in Python 3.5 using the Linux platform and is freely available at https://github.com/Hengfu-Yin/IsoSplitter.
format Online
Article
Text
id pubmed-8284324
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-82843242022-08-01 IsoSplitter: identification and characterization of alternative splicing sites without a reference genome Wang, Yupeng Hu, Zhikang Ye, Ning Yin, Hengfu RNA Bioinformatics Long-read transcriptome sequencing is designed to sequence full-length RNA molecules and advantageous for identifying alternative splice isoforms; however, in the absence of a reference genome, it is difficult to accurately locate splice sites because of the diversity of patterns of alternative splicing (AS). Based on long-read transcriptome data, we developed a versatile tool, IsoSplitter, to reverse-trace and validate AS gene “split sites” with the following features: (i) IsoSplitter initially invokes a modified SIM4 program to find transcript split sites; (ii) each split site is then quantified, to reveal transcript diversity, and putative isoforms are grouped into gene clusters; (iii) an optional step for aligning short reads is provided, to validate split sites by identifying unique junction reads, and revealing and quantifying tissue-specific alternative splice isoforms. We tested IsoSplitter AS prediction using data sets from multiple model and nonmodel plant species and showed that the IsoSplitter pipeline is efficient to handle different transcriptomes with high accuracy. Furthermore, we evaluated the IsoSplitter pipeline compared with that of the splice junction identification tools, Program to Assemble Spliced Alignments (PASA software needs a reference genome for AS identification) and AStrap, using data from the model plant Arabidopsis thaliana. We found that IsoSplitter determined more than twice as many AS events than AStrap analysis, and 94.13% of the IsoSplitter predicted AS events were also identified by the PASA analysis. Starting from a simple sequence file, IsoSplitter is an assembly-free tool for identification and characterization of AS. IsoSplitter is developed and implemented in Python 3.5 using the Linux platform and is freely available at https://github.com/Hengfu-Yin/IsoSplitter. Cold Spring Harbor Laboratory Press 2021-08 /pmc/articles/PMC8284324/ /pubmed/34021065 http://dx.doi.org/10.1261/rna.077834.120 Text en © 2021 Wang et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society https://creativecommons.org/licenses/by-nc/4.0/This article is distributed exclusively by the RNA Society for the first 12 months after the full-issue publication date (see http://rnajournal.cshlp.org/site/misc/terms.xhtml). After 12 months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Bioinformatics
Wang, Yupeng
Hu, Zhikang
Ye, Ning
Yin, Hengfu
IsoSplitter: identification and characterization of alternative splicing sites without a reference genome
title IsoSplitter: identification and characterization of alternative splicing sites without a reference genome
title_full IsoSplitter: identification and characterization of alternative splicing sites without a reference genome
title_fullStr IsoSplitter: identification and characterization of alternative splicing sites without a reference genome
title_full_unstemmed IsoSplitter: identification and characterization of alternative splicing sites without a reference genome
title_short IsoSplitter: identification and characterization of alternative splicing sites without a reference genome
title_sort isosplitter: identification and characterization of alternative splicing sites without a reference genome
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8284324/
https://www.ncbi.nlm.nih.gov/pubmed/34021065
http://dx.doi.org/10.1261/rna.077834.120
work_keys_str_mv AT wangyupeng isosplitteridentificationandcharacterizationofalternativesplicingsiteswithoutareferencegenome
AT huzhikang isosplitteridentificationandcharacterizationofalternativesplicingsiteswithoutareferencegenome
AT yening isosplitteridentificationandcharacterizationofalternativesplicingsiteswithoutareferencegenome
AT yinhengfu isosplitteridentificationandcharacterizationofalternativesplicingsiteswithoutareferencegenome