Cargando…

SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data

BACKGROUND: Spliced leader (SL) trans-splicing replaces the 5′ end of pre-mRNAs with the spliced leader, an exon derived from a specialised non-coding RNA originating from elsewhere in the genome. This process is essential for resolving polycistronic pre-mRNAs produced by eukaryotic operons into mon...

Descripción completa

Detalles Bibliográficos
Autores principales: Wenzel, Marius A., Müller, Berndt, Pettitt, Jonathan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7986045/
https://www.ncbi.nlm.nih.gov/pubmed/33752599
http://dx.doi.org/10.1186/s12859-021-04009-7
_version_ 1783668364088967168
author Wenzel, Marius A.
Müller, Berndt
Pettitt, Jonathan
author_facet Wenzel, Marius A.
Müller, Berndt
Pettitt, Jonathan
author_sort Wenzel, Marius A.
collection PubMed
description BACKGROUND: Spliced leader (SL) trans-splicing replaces the 5′ end of pre-mRNAs with the spliced leader, an exon derived from a specialised non-coding RNA originating from elsewhere in the genome. This process is essential for resolving polycistronic pre-mRNAs produced by eukaryotic operons into monocistronic transcripts. SL trans-splicing and operons may have independently evolved multiple times throughout Eukarya, yet our understanding of these phenomena is limited to only a few well-characterised organisms, most notably C. elegans and trypanosomes. The primary barrier to systematic discovery and characterisation of SL trans-splicing and operons is the lack of computational tools for exploiting the surge of transcriptomic and genomic resources for a wide range of eukaryotes. RESULTS: Here we present two novel pipelines that automate the discovery of SLs and the prediction of operons in eukaryotic genomes from RNA-Seq data. SLIDR assembles putative SLs from 5′ read tails present after read alignment to a reference genome or transcriptome, which are then verified by interrogating corresponding SL RNA genes for sequence motifs expected in bona fide SL RNA molecules. SLOPPR identifies RNA-Seq reads that contain a given 5′ SL sequence, quantifies genome-wide SL trans-splicing events and predicts operons via distinct patterns of SL trans-splicing events across adjacent genes. We tested both pipelines with organisms known to carry out SL trans-splicing and organise their genes into operons, and demonstrate that (1) SLIDR correctly detects expected SLs and often discovers novel SL variants; (2) SLOPPR correctly identifies functionally specialised SLs, correctly predicts known operons and detects plausible novel operons. CONCLUSIONS: SLIDR and SLOPPR are flexible tools that will accelerate research into the evolutionary dynamics of SL trans-splicing and operons throughout Eukarya and improve gene discovery and annotation for a wide range of eukaryotic genomes. Both pipelines are implemented in Bash and R and are built upon readily available software commonly installed on most bioinformatics servers. Biological insight can be gleaned even from sparse, low-coverage datasets, implying that an untapped wealth of information can be retrieved from existing RNA-Seq datasets as well as from novel full-isoform sequencing protocols as they become more widely available. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04009-7.
format Online
Article
Text
id pubmed-7986045
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-79860452021-03-24 SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data Wenzel, Marius A. Müller, Berndt Pettitt, Jonathan BMC Bioinformatics Software BACKGROUND: Spliced leader (SL) trans-splicing replaces the 5′ end of pre-mRNAs with the spliced leader, an exon derived from a specialised non-coding RNA originating from elsewhere in the genome. This process is essential for resolving polycistronic pre-mRNAs produced by eukaryotic operons into monocistronic transcripts. SL trans-splicing and operons may have independently evolved multiple times throughout Eukarya, yet our understanding of these phenomena is limited to only a few well-characterised organisms, most notably C. elegans and trypanosomes. The primary barrier to systematic discovery and characterisation of SL trans-splicing and operons is the lack of computational tools for exploiting the surge of transcriptomic and genomic resources for a wide range of eukaryotes. RESULTS: Here we present two novel pipelines that automate the discovery of SLs and the prediction of operons in eukaryotic genomes from RNA-Seq data. SLIDR assembles putative SLs from 5′ read tails present after read alignment to a reference genome or transcriptome, which are then verified by interrogating corresponding SL RNA genes for sequence motifs expected in bona fide SL RNA molecules. SLOPPR identifies RNA-Seq reads that contain a given 5′ SL sequence, quantifies genome-wide SL trans-splicing events and predicts operons via distinct patterns of SL trans-splicing events across adjacent genes. We tested both pipelines with organisms known to carry out SL trans-splicing and organise their genes into operons, and demonstrate that (1) SLIDR correctly detects expected SLs and often discovers novel SL variants; (2) SLOPPR correctly identifies functionally specialised SLs, correctly predicts known operons and detects plausible novel operons. CONCLUSIONS: SLIDR and SLOPPR are flexible tools that will accelerate research into the evolutionary dynamics of SL trans-splicing and operons throughout Eukarya and improve gene discovery and annotation for a wide range of eukaryotic genomes. Both pipelines are implemented in Bash and R and are built upon readily available software commonly installed on most bioinformatics servers. Biological insight can be gleaned even from sparse, low-coverage datasets, implying that an untapped wealth of information can be retrieved from existing RNA-Seq datasets as well as from novel full-isoform sequencing protocols as they become more widely available. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04009-7. BioMed Central 2021-03-22 /pmc/articles/PMC7986045/ /pubmed/33752599 http://dx.doi.org/10.1186/s12859-021-04009-7 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Wenzel, Marius A.
Müller, Berndt
Pettitt, Jonathan
SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data
title SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data
title_full SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data
title_fullStr SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data
title_full_unstemmed SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data
title_short SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data
title_sort slidr and sloppr: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from rna-seq data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7986045/
https://www.ncbi.nlm.nih.gov/pubmed/33752599
http://dx.doi.org/10.1186/s12859-021-04009-7
work_keys_str_mv AT wenzelmariusa slidrandslopprflexibleidentificationofsplicedleadertranssplicingandpredictionofeukaryoticoperonsfromrnaseqdata
AT mullerberndt slidrandslopprflexibleidentificationofsplicedleadertranssplicingandpredictionofeukaryoticoperonsfromrnaseqdata
AT pettittjonathan slidrandslopprflexibleidentificationofsplicedleadertranssplicingandpredictionofeukaryoticoperonsfromrnaseqdata