Cargando…

ExUTR: a novel pipeline for large-scale prediction of 3′-UTR sequences from NGS data

BACKGROUND: The three prime untranslated region (3′-UTR) is known to play a pivotal role in modulating gene expression by determining the fate of mRNA. Many crucial developmental events, such as mammalian spermatogenesis, tissue patterning, sex determination and neurogenesis, rely heavily on post-tr...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Zixia, Teeling, Emma C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5674806/
https://www.ncbi.nlm.nih.gov/pubmed/29110697
http://dx.doi.org/10.1186/s12864-017-4241-1
_version_ 1783276848776478720
author Huang, Zixia
Teeling, Emma C.
author_facet Huang, Zixia
Teeling, Emma C.
author_sort Huang, Zixia
collection PubMed
description BACKGROUND: The three prime untranslated region (3′-UTR) is known to play a pivotal role in modulating gene expression by determining the fate of mRNA. Many crucial developmental events, such as mammalian spermatogenesis, tissue patterning, sex determination and neurogenesis, rely heavily on post-transcriptional regulation by the 3′-UTR. However, 3′-UTR biology seems to be a relatively untapped field, with only limited tools and 3′-UTR resources available. To elucidate the regulatory mechanisms of the 3′-UTR on gene expression, firstly the 3′-UTR sequences must be identified. Current 3′-UTR mining tools, such as GETUTR, 3USS and UTRscan, all depend on a well-annotated reference genome or curated 3′-UTR sequences, which hinders their application on a myriad of non-model organisms where the genomes are not available. To address these issues, the establishment of an NGS-based, automated pipeline is urgently needed for genome-wide 3′-UTR prediction in the absence of reference genomes. RESULTS: Here, we propose ExUTR, a novel NGS-based pipeline to predict and retrieve 3′-UTR sequences from RNA-Seq experiments, particularly designed for non-model species lacking well-annotated genomes. This pipeline integrates cutting-edge bioinformatics tools, databases (Uniprot and UTRdb) and novel in-house Perl scripts, implementing a fully automated workflow. By taking transcriptome assemblies as inputs, this pipeline identifies 3′-UTR signals based primarily on the intrinsic features of transcripts, and outputs predicted 3′-UTR candidates together with associated annotations. In addition, ExUTR only requires minimal computational resources, which facilitates its implementation on a standard desktop computer with reasonable runtime, making it affordable to use for most laboratories. We also demonstrate the functionality and extensibility of this pipeline using publically available RNA-Seq data from both model and non-model species, and further validate the accuracy of predicted 3′-UTR using both well-characterized 3′-UTR resources and 3P–Seq data. CONCLUSIONS: ExUTR is a practical and powerful workflow that enables rapid genome-wide 3′-UTR discovery from NGS data. The candidates predicted through this pipeline will further advance the study of miRNA target prediction, cis elements in 3′-UTR and the evolution and biology of 3′-UTRs. Being independent of a well-annotated reference genome will dramatically expand its application to much broader research area, encompassing all species for which RNA-Seq is available. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-017-4241-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5674806
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56748062017-11-15 ExUTR: a novel pipeline for large-scale prediction of 3′-UTR sequences from NGS data Huang, Zixia Teeling, Emma C. BMC Genomics Software BACKGROUND: The three prime untranslated region (3′-UTR) is known to play a pivotal role in modulating gene expression by determining the fate of mRNA. Many crucial developmental events, such as mammalian spermatogenesis, tissue patterning, sex determination and neurogenesis, rely heavily on post-transcriptional regulation by the 3′-UTR. However, 3′-UTR biology seems to be a relatively untapped field, with only limited tools and 3′-UTR resources available. To elucidate the regulatory mechanisms of the 3′-UTR on gene expression, firstly the 3′-UTR sequences must be identified. Current 3′-UTR mining tools, such as GETUTR, 3USS and UTRscan, all depend on a well-annotated reference genome or curated 3′-UTR sequences, which hinders their application on a myriad of non-model organisms where the genomes are not available. To address these issues, the establishment of an NGS-based, automated pipeline is urgently needed for genome-wide 3′-UTR prediction in the absence of reference genomes. RESULTS: Here, we propose ExUTR, a novel NGS-based pipeline to predict and retrieve 3′-UTR sequences from RNA-Seq experiments, particularly designed for non-model species lacking well-annotated genomes. This pipeline integrates cutting-edge bioinformatics tools, databases (Uniprot and UTRdb) and novel in-house Perl scripts, implementing a fully automated workflow. By taking transcriptome assemblies as inputs, this pipeline identifies 3′-UTR signals based primarily on the intrinsic features of transcripts, and outputs predicted 3′-UTR candidates together with associated annotations. In addition, ExUTR only requires minimal computational resources, which facilitates its implementation on a standard desktop computer with reasonable runtime, making it affordable to use for most laboratories. We also demonstrate the functionality and extensibility of this pipeline using publically available RNA-Seq data from both model and non-model species, and further validate the accuracy of predicted 3′-UTR using both well-characterized 3′-UTR resources and 3P–Seq data. CONCLUSIONS: ExUTR is a practical and powerful workflow that enables rapid genome-wide 3′-UTR discovery from NGS data. The candidates predicted through this pipeline will further advance the study of miRNA target prediction, cis elements in 3′-UTR and the evolution and biology of 3′-UTRs. Being independent of a well-annotated reference genome will dramatically expand its application to much broader research area, encompassing all species for which RNA-Seq is available. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-017-4241-1) contains supplementary material, which is available to authorized users. BioMed Central 2017-11-06 /pmc/articles/PMC5674806/ /pubmed/29110697 http://dx.doi.org/10.1186/s12864-017-4241-1 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Huang, Zixia
Teeling, Emma C.
ExUTR: a novel pipeline for large-scale prediction of 3′-UTR sequences from NGS data
title ExUTR: a novel pipeline for large-scale prediction of 3′-UTR sequences from NGS data
title_full ExUTR: a novel pipeline for large-scale prediction of 3′-UTR sequences from NGS data
title_fullStr ExUTR: a novel pipeline for large-scale prediction of 3′-UTR sequences from NGS data
title_full_unstemmed ExUTR: a novel pipeline for large-scale prediction of 3′-UTR sequences from NGS data
title_short ExUTR: a novel pipeline for large-scale prediction of 3′-UTR sequences from NGS data
title_sort exutr: a novel pipeline for large-scale prediction of 3′-utr sequences from ngs data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5674806/
https://www.ncbi.nlm.nih.gov/pubmed/29110697
http://dx.doi.org/10.1186/s12864-017-4241-1
work_keys_str_mv AT huangzixia exutranovelpipelineforlargescalepredictionof3utrsequencesfromngsdata
AT teelingemmac exutranovelpipelineforlargescalepredictionof3utrsequencesfromngsdata