Cargando…
SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data
BACKGROUND: Spliced leader (SL) trans-splicing replaces the 5′ end of pre-mRNAs with the spliced leader, an exon derived from a specialised non-coding RNA originating from elsewhere in the genome. This process is essential for resolving polycistronic pre-mRNAs produced by eukaryotic operons into mon...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7986045/ https://www.ncbi.nlm.nih.gov/pubmed/33752599 http://dx.doi.org/10.1186/s12859-021-04009-7 |
_version_ | 1783668364088967168 |
---|---|
author | Wenzel, Marius A. Müller, Berndt Pettitt, Jonathan |
author_facet | Wenzel, Marius A. Müller, Berndt Pettitt, Jonathan |
author_sort | Wenzel, Marius A. |
collection | PubMed |
description | BACKGROUND: Spliced leader (SL) trans-splicing replaces the 5′ end of pre-mRNAs with the spliced leader, an exon derived from a specialised non-coding RNA originating from elsewhere in the genome. This process is essential for resolving polycistronic pre-mRNAs produced by eukaryotic operons into monocistronic transcripts. SL trans-splicing and operons may have independently evolved multiple times throughout Eukarya, yet our understanding of these phenomena is limited to only a few well-characterised organisms, most notably C. elegans and trypanosomes. The primary barrier to systematic discovery and characterisation of SL trans-splicing and operons is the lack of computational tools for exploiting the surge of transcriptomic and genomic resources for a wide range of eukaryotes. RESULTS: Here we present two novel pipelines that automate the discovery of SLs and the prediction of operons in eukaryotic genomes from RNA-Seq data. SLIDR assembles putative SLs from 5′ read tails present after read alignment to a reference genome or transcriptome, which are then verified by interrogating corresponding SL RNA genes for sequence motifs expected in bona fide SL RNA molecules. SLOPPR identifies RNA-Seq reads that contain a given 5′ SL sequence, quantifies genome-wide SL trans-splicing events and predicts operons via distinct patterns of SL trans-splicing events across adjacent genes. We tested both pipelines with organisms known to carry out SL trans-splicing and organise their genes into operons, and demonstrate that (1) SLIDR correctly detects expected SLs and often discovers novel SL variants; (2) SLOPPR correctly identifies functionally specialised SLs, correctly predicts known operons and detects plausible novel operons. CONCLUSIONS: SLIDR and SLOPPR are flexible tools that will accelerate research into the evolutionary dynamics of SL trans-splicing and operons throughout Eukarya and improve gene discovery and annotation for a wide range of eukaryotic genomes. Both pipelines are implemented in Bash and R and are built upon readily available software commonly installed on most bioinformatics servers. Biological insight can be gleaned even from sparse, low-coverage datasets, implying that an untapped wealth of information can be retrieved from existing RNA-Seq datasets as well as from novel full-isoform sequencing protocols as they become more widely available. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04009-7. |
format | Online Article Text |
id | pubmed-7986045 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-79860452021-03-24 SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data Wenzel, Marius A. Müller, Berndt Pettitt, Jonathan BMC Bioinformatics Software BACKGROUND: Spliced leader (SL) trans-splicing replaces the 5′ end of pre-mRNAs with the spliced leader, an exon derived from a specialised non-coding RNA originating from elsewhere in the genome. This process is essential for resolving polycistronic pre-mRNAs produced by eukaryotic operons into monocistronic transcripts. SL trans-splicing and operons may have independently evolved multiple times throughout Eukarya, yet our understanding of these phenomena is limited to only a few well-characterised organisms, most notably C. elegans and trypanosomes. The primary barrier to systematic discovery and characterisation of SL trans-splicing and operons is the lack of computational tools for exploiting the surge of transcriptomic and genomic resources for a wide range of eukaryotes. RESULTS: Here we present two novel pipelines that automate the discovery of SLs and the prediction of operons in eukaryotic genomes from RNA-Seq data. SLIDR assembles putative SLs from 5′ read tails present after read alignment to a reference genome or transcriptome, which are then verified by interrogating corresponding SL RNA genes for sequence motifs expected in bona fide SL RNA molecules. SLOPPR identifies RNA-Seq reads that contain a given 5′ SL sequence, quantifies genome-wide SL trans-splicing events and predicts operons via distinct patterns of SL trans-splicing events across adjacent genes. We tested both pipelines with organisms known to carry out SL trans-splicing and organise their genes into operons, and demonstrate that (1) SLIDR correctly detects expected SLs and often discovers novel SL variants; (2) SLOPPR correctly identifies functionally specialised SLs, correctly predicts known operons and detects plausible novel operons. CONCLUSIONS: SLIDR and SLOPPR are flexible tools that will accelerate research into the evolutionary dynamics of SL trans-splicing and operons throughout Eukarya and improve gene discovery and annotation for a wide range of eukaryotic genomes. Both pipelines are implemented in Bash and R and are built upon readily available software commonly installed on most bioinformatics servers. Biological insight can be gleaned even from sparse, low-coverage datasets, implying that an untapped wealth of information can be retrieved from existing RNA-Seq datasets as well as from novel full-isoform sequencing protocols as they become more widely available. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04009-7. BioMed Central 2021-03-22 /pmc/articles/PMC7986045/ /pubmed/33752599 http://dx.doi.org/10.1186/s12859-021-04009-7 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Wenzel, Marius A. Müller, Berndt Pettitt, Jonathan SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data |
title | SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data |
title_full | SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data |
title_fullStr | SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data |
title_full_unstemmed | SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data |
title_short | SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data |
title_sort | slidr and sloppr: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from rna-seq data |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7986045/ https://www.ncbi.nlm.nih.gov/pubmed/33752599 http://dx.doi.org/10.1186/s12859-021-04009-7 |
work_keys_str_mv | AT wenzelmariusa slidrandslopprflexibleidentificationofsplicedleadertranssplicingandpredictionofeukaryoticoperonsfromrnaseqdata AT mullerberndt slidrandslopprflexibleidentificationofsplicedleadertranssplicingandpredictionofeukaryoticoperonsfromrnaseqdata AT pettittjonathan slidrandslopprflexibleidentificationofsplicedleadertranssplicingandpredictionofeukaryoticoperonsfromrnaseqdata |